along with chapter 7 of Draghici: Statistics and Data Analysis for Microarrays Using R and Bioconductor
- using the data of T. Golub's paper from 1999 on leukemia classification
- the data is contained in a package and can be installed
source("http://bioconductor.org/biocLite.R") biocLite("golubEsets")
- then load the package and the data
require(golubEsets)
data(Golub_Merge)
Golub_Merge
ExpressionSet (storageMode: lockedEnvironment) assayData: 7129 features, 72 samples element names: exprs protocolData: none phenoData sampleNames: 39 40 ... 33 (72 total) varLabels: Samples ALL.AML ... Source (11 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' pubMedIds: 10521349 Annotation: hu6800
Golub_Merge
is of classExpressionSet
- the
ExpressionSet
class is part of theBiobase
package
- this class is designed to combine several different sources of information into one single structure
- is input for many Bioconductor functions
- it consists of:
- expression data from microarray experiments (
assayData
)
- meta data describing samples in experiments (
phenoData
)
- annotations and meta-data about the features on the chip or technology used for the experiment (
featureData,annotation
)
- information related to the protocol used for processing each sample (and usually extracted from manufacturer files,
protocolData
)
- and a exible structure to describe the experiment (
experimentData
)
- expression data from microarray experiments (
- so we can get experiment-level metadata along with the pubmed ID
experimentData(Golub_Merge)
Experiment data Experimenter name: Golub TR et al. Laboratory: Whitehead Contact information: Title: ALL/AML discrimination URL: www-genome.wi.mit.edu/mpr/data_set_ALL_AML.html PMIDs: 10521349 Abstract: A 133 word abstract is available. Use 'abstract' method.
- show the first part of the abstract
substr(abstract(Golub_Merge),1,102)
[1] "Although cancer classification has improved over the past 30 years, there has been no general approach"
- one get get the dimension of the expression data
dim(exprs(Golub_Merge))
- look at the five rows of the first 5 columns
exprs(Golub_Merge)[1:5,1:5]
39 40 42 47 48 AFFX-BioB-5_at -342 -87 22 -243 -130 AFFX-BioB-M_at -200 -248 -153 -218 -177 AFFX-BioB-3_at 41 262 17 -163 -28 AFFX-BioC-5_at 328 295 276 182 266 AFFX-BioC-3_at -224 -226 -211 -289 -170
- retrieve information on experimental phenotypes (again we look only at the first five samples/rows)
pData(Golub_Merge)[1:5,]
Samples ALL.AML BM.PB T.B.cell FAB Date Gender pctBlasts Treatment 39 39 ALL BM B-cell <NA> F NA <NA> 40 40 ALL BM B-cell <NA> 5/16/1980 F NA <NA> 42 42 ALL BM B-cell <NA> <NA> F NA <NA> 47 47 ALL BM B-cell <NA> 9/5/1986 M NA <NA> 48 48 ALL BM B-cell <NA> 2/28/1992 F NA <NA> PS Source 39 0.78 DFCI 40 0.68 DFCI 42 0.42 DFCI 47 0.81 DFCI 48 0.94 DFCI
- how are the assay reporters named?
featureNames(Golub_Merge)[1:5]
[1] "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" [5] "AFFX-BioC-3_at"
- how are the samples named?
sampleNames(Golub_Merge)
[1] "39" "40" "42" "47" "48" "49" "41" "43" "44" "45" "46" "70" "71" "72" "68" [16] "69" "67" "55" "56" "59" "52" "53" "51" "50" "54" "57" "58" "60" "61" "65" [31] "66" "63" "64" "62" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" [46] "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" [61] "27" "34" "35" "36" "37" "38" "28" "29" "30" "31" "32" "33"
- show the distribution of the primary outcome
table(Golub_Merge$ALL.AML)
ALL AML 47 25
No comments :
Post a Comment