R, Ruby, Perl und ich: R - Representation of microarray data

along with chapter 7 of Draghici: Statistics and Data Analysis for Microarrays Using R and Bioconductor

using the data of T. Golub's paper from 1999 on leukemia classification
the data is contained in a package and can be installed

source("http://bioconductor.org/biocLite.R")
biocLite("golubEsets")

then load the package and the data

require(golubEsets)
data(Golub_Merge)
Golub_Merge

ExpressionSet (storageMode: lockedEnvironment)
assayData: 7129 features, 72 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 39 40 ... 33 (72 total)
  varLabels: Samples ALL.AML ... Source (11 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 10521349 
Annotation: hu6800

Golub_Merge is of class ExpressionSet
the ExpressionSet class is part of the Biobase package
this class is designed to combine several different sources of information into one single structure
is input for many Bioconductor functions
it consists of:
- expression data from microarray experiments (assayData)
- meta data describing samples in experiments (phenoData)
- annotations and meta-data about the features on the chip or technology used for the experiment (featureData,annotation)
- information related to the protocol used for processing each sample (and usually extracted from manufacturer files, protocolData)
- and a exible structure to describe the experiment (experimentData)
so we can get experiment-level metadata along with the pubmed ID

experimentData(Golub_Merge)

Experiment data
  Experimenter name: Golub TR et al. 
  Laboratory: Whitehead 
  Contact information: 
 
  Title: ALL/AML discrimination 
  URL: www-genome.wi.mit.edu/mpr/data_set_ALL_AML.html 
  PMIDs: 10521349 

  Abstract: A 133 word abstract is available. Use 'abstract' method.

show the first part of the abstract

substr(abstract(Golub_Merge),1,102)

[1] "Although cancer classification has improved over the past 30 years, there has been no general approach"

one get get the dimension of the expression data

dim(exprs(Golub_Merge))

look at the five rows of the first 5 columns

exprs(Golub_Merge)[1:5,1:5]

                 39   40   42   47   48
AFFX-BioB-5_at -342  -87   22 -243 -130
AFFX-BioB-M_at -200 -248 -153 -218 -177
AFFX-BioB-3_at   41  262   17 -163  -28
AFFX-BioC-5_at  328  295  276  182  266
AFFX-BioC-3_at -224 -226 -211 -289 -170

retrieve information on experimental phenotypes (again we look only at the first five samples/rows)

pData(Golub_Merge)[1:5,]

Samples ALL.AML BM.PB T.B.cell  FAB      Date Gender pctBlasts Treatment
39      39     ALL    BM   B-cell <NA>                F        NA      <NA>
40      40     ALL    BM   B-cell <NA> 5/16/1980      F        NA      <NA>
42      42     ALL    BM   B-cell <NA>      <NA>      F        NA      <NA>
47      47     ALL    BM   B-cell <NA>  9/5/1986      M        NA      <NA>
48      48     ALL    BM   B-cell <NA> 2/28/1992      F        NA      <NA>
     PS Source
39 0.78   DFCI
40 0.68   DFCI
42 0.42   DFCI
47 0.81   DFCI
48 0.94   DFCI

how are the assay reporters named?

featureNames(Golub_Merge)[1:5]

[1] "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at"
[5] "AFFX-BioC-3_at"

how are the samples named?

sampleNames(Golub_Merge)

 [1] "39" "40" "42" "47" "48" "49" "41" "43" "44" "45" "46" "70" "71" "72" "68"
[16] "69" "67" "55" "56" "59" "52" "53" "51" "50" "54" "57" "58" "60" "61" "65"
[31] "66" "63" "64" "62" "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"
[46] "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
[61] "27" "34" "35" "36" "37" "38" "28" "29" "30" "31" "32" "33"

show the distribution of the primary outcome

table(Golub_Merge$ALL.AML)

ALL AML 
 47  25

R, Ruby, Perl und ich

Sunday, July 7, 2013

R - Representation of microarray data

No comments :

Post a Comment