Thursday, June 20, 2013

ls - show only folders

use du

du ~/ --max-depth=1
22576   /home/mandy/.mozilla
4       /home/mandy/Öffentlich
2956    /home/mandy/.config
28      /home/mandy/.icons
16      /home/mandy/.adobe
4714980 /home/mandy/klinik
4       /home/mandy/Vorlagen
4       /home/mandy/fserver
12      /home/mandy/.dbus
4       /home/mandy/Dokumente
19628   /home/mandy/Downloads
120     /home/mandy/.rstudio-desktop
36      /home/mandy/.compiz
5258420 /home/mandy/ggmbh
2940    /home/mandy/.local
44      /home/mandy/.emacs.d
8       /home/mandy/.ssh
8       /home/mandy/.gnome2
44      /home/mandy/.gnupg
4       /home/mandy/Videos
356     /home/mandy/Arbeitsfläche
52      /home/mandy/.gconf
15245532        /home/mandy/fileserver
4       /home/mandy/.hplip
95084   /home/mandy/.cpan
4       /home/mandy/Musik
64      /home/mandy/.macromedia
4       /home/mandy/Ubuntu One
8590040 /home/mandy/servercn
628     /home/mandy/Bilder
4       /home/mandy/.gnome2_private
4       /home/mandy/server170
110976  /home/mandy/R
107880  /home/mandy/.cache
36      /home/mandy/.cpanm
28      /home/mandy/.shutter
92      /home/mandy/.thumbnails
34173652        /home/mandy/

Wednesday, June 19, 2013

R - Grammar of Graphics Figure 2.1


par(mar=c(1,1,1,1))
openplotmat()
elpos <- coordinates(3)
fromto <- matrix(ncol=2,byrow = T,data=c(1,2,2,3))
nr <- nrow(fromto)
arrpos <- matrix(ncol=2,nrow=nr)

for(i in 1:nr){
    arrpos[i,] <- straightarrow(to=elpos[fromto[i,2],],
                                from=elpos[fromto[i,1],],
                                lwd = 2, arr.pos = 0.68,
                                arr.length = 0.5)
}

textrect(elpos[1,],0.1,0.09,lab="Source",shadow.col = NULL)
textellipse(elpos[2,],0.1,0.09,lab="Make a pie",shadow.col = NULL)
textrect(elpos[3,],0.1,0.09,lab="Render",shadow.col = NULL)

text(arrpos[1,1]-0.07,arrpos[1,2]-0.08,"Data")
text(arrpos[2,1]-0.06,arrpos[2,2]-0.08,"Graphics")


if installing perl modules via CPAN does not work


  • check if proxy configured correctly(o conf init /proxy/)
  • maybe you should remove files in ~/.cpan/sources/modules

Tuesday, June 18, 2013

R - Grammar of Graphics Figure 1.1 redone with ggplot


An Example (1.4)


  • first we have to get the data which are birth and death rates of the year 1990, therefore we use the world bank data (and of course, there is a package WDI providing direct access)
  • so load the package and download a list of indicators (WDIcache)
  • the resulting data frame contains indicator and name, and additional information (data description, source)

require(WDI)
indicators <- as.data.frame(WDIcache()$series)
names(indicators)
[1] "indicator"          "name"               "description"       
[4] "sourceDatabase"     "sourceOrganization"
  • then we have to find our two parameters of interest: the crude birth and the crude death rate:
    • we use grep on the column name to find them

grep("crude",indicators$name)
[1] 6553 6554
  • so we know that there are two lines containing the string crude in the name variable, so let's show them
indicators[grep("crude",indicators$name),]
indicator                                 name
6553 SP.DYN.CBRT.IN Birth rate, crude (per 1,000 people)
6554 SP.DYN.CDRT.IN Death rate, crude (per 1,000 people)
                                                                                                                                                                                                                                                                                                   description
6553 Crude birth rate indicates the number of live births occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.
6554      Crude death rate indicates the number of deaths occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.
                   sourceDatabase
6553 World Development Indicators
6554 World Development Indicators
                                                                                                                                                                                                                                                                                                                                                                                                                         sourceOrganization
6553 (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
6554 (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
  • now we now the indicators - and we can download the desired data (and just in case: we download them for the years 1980–2012)
rate.data <- WDI(country="all",indicator=indicators[grep("crude",indicators$name),1],start=1980,end=2012)
rate.data.1990 <- rate.data[rate.data$year==1990,]
head(rate.data.1990)
iso2c                                 country year SP.DYN.CBRT.IN
11     1A                              Arab World 1990       34.71824
44     1W                                   World 1990       25.85034
77     4E   East Asia & Pacific (developing only) 1990       22.84601
110    7E Europe & Central Asia (developing only) 1990       17.90782
143    8S                              South Asia 1990       32.91213
176    AD                                 Andorra 1990             NA
    SP.DYN.CDRT.IN
11        8.262026
44        9.266023
77        6.997506
110      10.065765
143      10.703840
176             NA


  • now we have to remove aggregations from these data (such as the European Union, Middle East, etc)
    • we use grepl on the iso2c column (which does the same as grep but returns logical values (of course you can also use grep again))
    • the iso2c is a two character, standardized country code
    • we eliminate all rows with iso2c starting with X, equals EU, containing a number, or equals ZQ, ZJ, ZG, or ZF

rate.data.1990 <- rate.data.1990[!grepl("^X|EU|\\d|Z[QJGF]",rate.data.1990$iso2c,perl=T),]
head(rate.data.1990)


    iso2c              country year SP.DYN.CBRT.IN SP.DYN.CDRT.IN
176    AD              Andorra 1990             NA             NA
209    AE United Arab Emirates 1990         25.916          2.794
242    AF          Afghanistan 1990         52.449         22.062
275    AG  Antigua and Barbuda 1990         20.100          6.800
308    AL              Albania 1990         24.610          5.909
341    AM              Armenia 1990         21.215          7.738
  • so that's the data frame
  • now the graphics
require(ggplot2)
require(gridExtra)
require(scales)

rate.data.1990$lab <- ifelse(rbinom(size=1,n=nrow(rate.data.1990),prob=0.2)==1,rate.data.1990$country,"")
ggplot(rate.data.1990,aes(x=SP.DYN.CBRT.IN,y=SP.DYN.CDRT.IN)) +
    stat_density2d(aes(colour= ..level..),bins=6,h=c(11,9),geom="density2d") +
    scale_x_continuous("Birth Rate", limits = c(0,60),breaks=seq(0,60,by=10),expand=c(0,1)) +
    scale_y_continuous("Death Rate", limits = c(0,30),breaks=seq(0,30,by=10),expand=c(0,1)) +
    scale_colour_gradientn(colours=c("blue","green","lightgreen","red"),guide="none") +
    geom_text(aes(label=lab),size=3,position = "jitter") +
    geom_abline(intersect=0,slope=1) +
    coord_fixed() +
    annotate("text",x=20,y=20,angle=45, label="Zero Population Growth",vjust=-0.5,size=4) +
    theme(
        panel.background=element_blank(),
        panel.border=element_rect(colour="black",fill="transparent"),
        axis.text = element_text(colour="black"),
        axis.ticks = element_line(colour="black"),
        axis.line = element_line(colour="black")
        ) 


  • note:
    • the plot in chapter 1.4 of the book uses an epanechnikov kernel, whereas ggplot uses a normal one
    • I choose randomly about 20% of the country names as labels (because I was to lazy to pick up exactly those used in the book)


Sunday, June 16, 2013

R - First steps working with microarray data

Working with Microarray Data

along with chapter 19 Lewis, R for Medicine and Biology

raw CEL files

  • the bioconductor packages affy and simpleaffy allow to work with raw microarray data
  • also carry out quality control and simple data analysis
#source("http://bioconductor.org/biocLite.R")
#biocLite("affy")
#biocLite("simpleaffy")
org_babel_R_eoe
  • get some raw data

download.file(url="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5090/suppl/GSE5090_RAW.tar",destfile="arraydata.tar")
versuche URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5090/suppl/GSE5090_RAW.tar'
ftp data connection made, file length 63569920 bytes
URL geöffnet
=================================================
downloaded 60.6 Mb
  • read in data
setwd("~/affyfiles")
 data.raw <- read.affy("covdesc")
setwd("~/")
  • the function read.affy() gives back a object of class AffyBatch (which extends the eSet class - Biobase package)
  • the cdfName slot: Affymetrix array plotform used
  • nrow slot: number of rows in the Affymetrix chip
data.raw@cdfName
[1] "HG-U133A"
data.raw@nrow
Rows 
 712
data.raw@ncol

Cols 
 712
  • so the HG-U133A array has 712 rows and 712 columns -> 506944 spots per chip
  • obtain probe-level data from each individual spot using the exprs() function
    • e.g. intesity data for the last four spots on the first array of the data.raw object

intensity(data.raw)[506940:506944,1]
506940  506941  506942  506943  506944 
  109.0 10380.5   175.3 10044.3   158.0

Normalizing/Quality Control

  • QC methods:
    • density plots of PM intensities
    • RNA degradation plots
    • MvA plots
  • MvA plots

    • assess how similar probe-level data is between samples in the same subgroup, detection of any bias in intensities between samples by assessment of the curve produced
    • M represents log fold change between two samples (y-axis)
    • A represents the mean absolute log expression between samples (x-axis)
    setwd("~/affyfiles")
    require(simpleaffy)
    require(affy)
    data.raw <- read.affy("covdesc")
    setwd("../")
    
    MAplot(data.raw[c(1,4)],pairs=T)
    

    • normalize data and plot again
    data.norm <- normalize(data.raw)
    MAplot(data.norm[c(1,4)],pairs=T)
    

From probe-level data to expression data

  • create expression levels from the raw data
  • create a PMA detection call and the associated p-values from the first array
#  data.mas5 <- call.exprs(data.raw,"mas5")
#  sample1 <- detection.p.val(data.raw[,1],calls=T)
  cbind(sample1$call[1:10],sample1$pval[1:10])
Warnmeldung:
In data.raw[c(1, 4)] :
  The use of abatch[i,] and abatch[i] is deprecated. Please use abatch[,i] instead.
X11cairo 
       2
      [,1] [,2]                 
 [1,] "P"  "0.00261701434183198"
 [2,] "P"  "0.00486279317241156"
 [3,] "P"  "0.00564280932389143"
 [4,] "P"  "0.00418036711173471"
 [5,] "A"  "0.378184514727312"  
 [6,] "P"  "0.00306673915877408"
 [7,] "P"  "0.00564280932389143"
 [8,] "A"  "0.0894050784997029" 
 [9,] "P"  "0.00116478895452843"
[10,] "A"  "0.0813367142140589"
  • qc() generates commonly used quality control metrics incl:
    • average background
    • scale factor
    • number of genes called present
    • 3' to 5' end ratios of beta-actin and GADPH genes

data.qc <- qc(data.raw,data.mas5)
plot(data.qc)



create a list of differentially expressed genes


  • data.mas5 is an instance of the AffyBatch class, so we could apply AffyBatch functions
  • exprs() returns expression-level data different probes
    • e.g. expression levels computed by the Mas 5.0 algorithm for the first 10 probes on the array for four control samples
exprs(data.mas5)[1:10,1:4]
GSM114834.CEL GSM114840.CEL GSM114841.CEL GSM114842.CEL
1007_s_at      7.623914      7.767622      7.799031      7.991894
1053_at        4.748147      5.127184      5.149059      4.975323
117_at         5.552385      6.005922      5.376999      5.947452
121_at         7.789410      7.900269      7.845280      8.188128
1255_g_at      1.254626      3.157348      3.842650      3.800311
1294_at        6.594926      6.445978      7.237356      6.657093
1316_at        5.118859      5.160522      4.894248      4.714638
1320_at        5.110600      5.188031      5.133438      5.185311
1405_i_at      6.766398      6.799344      5.578731      5.805039
1431_at        2.900998      3.487406      4.493393      3.641552
  • the function parwise.comparison() generates fold changes, t-tests, and means for pairs of experimental groups
  • pairwise.comparison() takes as input an exprSet object, value is an instance of the PairComp class
  • pairwise.filter() takes as input an PairComp object, filter regarding to:
    • minimum expression level
    • minimum number of arrays in which a gene is called present across all groups or by group
    • minimum fold change
    • maximum t-test p-val for a gene

  • example: filter results of pairwise comparison using:
    • genes must be called present in at least three samples in one of the groups
    • t-test p-val must be less then 0.01
    • fold change between means of the two groups must be at least 1.5
pair <- pairwise.comparison(data.mas5,group="disease",members=c("control", "polycystic_ovary_syndrome"),spots=data.raw)
pair.filt <- pairwise.filter(pair,min.present.no=3,present.by.group=T,fc=log2(1.5),tt=0.05)
[1] "Checking member control in group: ' disease '"
[1] "Checking member polycystic_ovary_syndrome in group: ' disease '"
  • the pair.filt object is also an instance of the PairComp class
  • how many genes have been returned?:
nrow(pair.filt@means)
[1] 54
pair.filt@means[1:10,1:2]
control polycystic_ovary_syndrome
200879_s_at  7.198598                  6.585865
200951_s_at  4.417044                  5.126597
200974_at   10.134294                  9.549187
201242_s_at  7.949515                  8.556699
201468_s_at  8.909398                  9.676864
201496_x_at  6.486854                  5.289261
201497_x_at  9.179405                  8.257583
202040_s_at  6.770394                  7.379680
202104_s_at  6.888542                  6.274534
202274_at    6.554181                  5.746637
  • view the PMA detection calls using the calls slot
pair.filt@calls[1:10,]
GSM114834.CEL.present GSM114840.CEL.present GSM114841.CEL.present
200879_s_at "P"                   "P"                   "P"                  
200951_s_at "P"                   "P"                   "A"                  
200974_at   "P"                   "P"                   "P"                  
201242_s_at "P"                   "P"                   "P"                  
201468_s_at "P"                   "P"                   "P"                  
201496_x_at "P"                   "P"                   "P"                  
201497_x_at "P"                   "P"                   "P"                  
202040_s_at "P"                   "P"                   "P"                  
202104_s_at "P"                   "P"                   "P"                  
202274_at   "P"                   "A"                   "P"                  
            GSM114842.CEL.present GSM114843.CEL.present GSM114844.CEL.present
200879_s_at "P"                   "P"                   "P"                  
200951_s_at "P"                   "A"                   "A"                  
200974_at   "P"                   "P"                   "P"                  
201242_s_at "P"                   "P"                   "P"                  
201468_s_at "P"                   "P"                   "P"                  
201496_x_at "P"                   "P"                   "P"                  
201497_x_at "P"                   "P"                   "P"                  
202040_s_at "P"                   "P"                   "P"                  
202104_s_at "P"                   "A"                   "P"                  
202274_at   "A"                   "P"                   "P"                  
            GSM114845.CEL.present GSM114846.CEL.present GSM114847.CEL.present
200879_s_at "P"                   "P"                   "P"                  
200951_s_at "A"                   "P"                   "A"                  
200974_at   "P"                   "P"                   "P"                  
201242_s_at "P"                   "P"                   "P"                  
201468_s_at "P"                   "P"                   "P"                  
201496_x_at "P"                   "P"                   "A"                  
201497_x_at "P"                   "P"                   "P"                  
202040_s_at "P"                   "P"                   "P"                  
202104_s_at "M"                   "P"                   "P"                  
202274_at   "P"                   "P"                   "A"                  
            GSM114848.CEL.present GSM114849.CEL.present GSM114850.CEL.present
200879_s_at "P"                   "P"                   "P"                  
200951_s_at "A"                   "A"                   "A"                  
200974_at   "P"                   "P"                   "P"                  
201242_s_at "P"                   "P"                   "P"                  
201468_s_at "P"                   "P"                   "P"                  
201496_x_at "P"                   "P"                   "P"                  
201497_x_at "P"                   "P"                   "P"                  
202040_s_at "P"                   "P"                   "P"                  
202104_s_at "M"                   "P"                   "P"                  
202274_at   "A"                   "P"                   "A"                  
            GSM114851.CEL.present GSM114852.CEL.present GSM114853.CEL.present
200879_s_at "P"                   "P"                   "P"                  
200951_s_at "A"                   "A"                   "P"                  
200974_at   "P"                   "P"                   "P"                  
201242_s_at "P"                   "P"                   "P"                  
201468_s_at "P"                   "P"                   "P"                  
201496_x_at "P"                   "P"                   "A"                  
201497_x_at "P"                   "P"                   "A"                  
202040_s_at "P"                   "P"                   "P"                  
202104_s_at "P"                   "P"                   "P"                  
202274_at   "P"                   "P"                   "A"                  
            GSM114854.CEL.present GSM114855.CEL.present
200879_s_at "P"                   "P"                  
200951_s_at "A"                   "A"                  
200974_at   "P"                   "P"                  
201242_s_at "P"                   "P"                  
201468_s_at "P"                   "P"                  
201496_x_at "M"                   "P"                  
201497_x_at "M"                   "P"                  
202040_s_at "P"                   "P"                  
202104_s_at "P"                   "M"                  
202274_at   "A"                   "P"
  • fold change between the two groups for each gene fc slot
pair.filt@fc[1:10]
200879_s_at 200951_s_at   200974_at 201242_s_at 201468_s_at 201496_x_at 
  0.6127330  -0.7095537   0.5851071  -0.6071841  -0.7674657   1.1975929 
201497_x_at 202040_s_at 202104_s_at   202274_at 
  0.9218226  -0.6092852   0.6140086   0.8075434
  • t-test p-vals
pair.filt@tt[1:10]
200879_s_at 200951_s_at   200974_at 201242_s_at 201468_s_at 201496_x_at 
0.045650045 0.023617284 0.044967617 0.046132070 0.037101434 0.024593573 
201497_x_at 202040_s_at 202104_s_at   202274_at 
0.047164613 0.001084647 0.017975104 0.033348173

Wednesday, June 12, 2013

R - change wilcard pattern into regular expression

  • the command glob2rx takes a string containing wildcards (such as * or ?) into an equivalent regular expression
glob2rx("pa??ern")
^pa..ern$
  • the arguments trim.head and trim.tail could be set to determine whether or not the leading "^" or the trailing "$" should be trimmed from the result
glob2rx("*pa??ern",trim.head=TRUE)
pa..ern$
glob2rx("pa??e.n*",trim.tail=TRUE)

Monday, June 10, 2013

R - first steps with GEOquery

Along with chapter 18 of Lewis, R for Medicine and Biology

  • Installing the packages

    source("http://bioconductor.org/biocLite.R")
    biocLite("simpleaffy")
    biocLite("GEOquery")
    
    GEOquery
    
  • Gene Expression Omnibus Repository (GEO)

    • public repository
    • allows submitting of high-throughput experimental data for free access by others
    • includes single- and dual-channel microarrays, measuring mRNA, miRNA, genomic DNA (incl. arrayCGH and SNP) and protein abundance
    • also contains a collection of web-based tools for querying and downloading of datasets
    • for R there exists the GEOquery package
  • sample dataset: Polycystic ovary syndrome: adipose tissue

    • get data
    library(GEOquery)
    gds <- getGEO("GDS2084")
    Meta(gds)
    
    Using locally cached version of GDS2084 found here:
    /tmp/Rtmpa2mlCM/GDS2084.soft.gz
    $channel_count
    [1] "1"
    
    $dataset_id
    [1] "GDS2084" "GDS2084"
    
    $description
    [1] "Analysis of omental adipose tissues of morbidly obese patients with polycystic ovary syndrome (PCOS). PCOS is a common hormonal disorder among women of reproductive age, and is characterized by hyperandrogenism and chronic anovulation. PCOS is associated with obesity."
    [2] "control"                                                                                                                                                                                                                                                                     
    [3] "polycystic ovary syndrome"                                                                                                                                                                                                                                                   
    
    $email
    [1] "geo@ncbi.nlm.nih.gov"
    
    $feature_count
    [1] "22283"
    
    $institute
    [1] "NCBI NLM NIH"
    
    $name
    [1] "Gene Expression Omnibus (GEO)"
    
    $order
    [1] "none"
    
    $platform
    [1] "GPL96"
    
    $platform_organism
    [1] "Homo sapiens"
    
    $platform_technology_type
    [1] "in situ oligonucleotide"
    
    $pubmed_id
    [1] "17062763"
    
    $ref
    [1] "Nucleic Acids Res. 2005 Jan 1;33 Database Issue:D562-6"
    
    $reference_series
    [1] "GSE5090"
    
    $sample_count
    [1] "15"
    
    $sample_id
    [1] "GSM114841,GSM114844,GSM114845,GSM114849,GSM114851,GSM114854,GSM114855"          
    [2] "GSM114834,GSM114842,GSM114843,GSM114847,GSM114848,GSM114850,GSM114852,GSM114853"
    
    $sample_organism
    [1] "Homo sapiens"
    
    $sample_type
    [1] "RNA"
    
    $title
    [1] "Polycystic ovary syndrome: adipose tissue"
    
    $type
    [1] "Expression profiling by array" "disease state"                
    [3] "disease state"                
    
    $update_date
    [1] "Mar 21 2007"
    
    $value_type
    [1] "count"
    
    $web_link
    [1] "http://www.ncbi.nlm.nih.gov/geo"
    


    • display individual expression values
      • first column (ID_REF): probe ID assigned by Affymetrix for each probe
      • second column (IDENTIFIER): ID for the corresponding transcript
      • remaining columns: expression values returned for each array

    Table(gds)[1:10,1:5]
    
    ID_REF IDENTIFIER GSM114841 GSM114844 GSM114845
    1  1007_s_at       DDR1     222.6     252.7     219.3
    2    1053_at       RFC2      35.5      24.5      23.4
    3     117_at      HSPA6      41.5      53.3      31.3
    4     121_at       PAX8     229.8     419.6     274.5
    5  1255_g_at     GUCA1A      14.3        13      29.6
    6    1294_at       UBA7     150.8       116      89.9
    7    1316_at       THRA      29.7      35.4        53
    8    1320_at     PTPN21      35.1      44.8      28.8
    9  1405_i_at       CCL5      47.8      53.2      41.7
    10   1431_at     CYP2E1      22.5      24.9      38.7
    
    • get disease status associated with each sample
    Columns(gds)[,1:2]
    
    sample             disease.state
    1  GSM114841                   control
    2  GSM114844                   control
    3  GSM114845                   control
    4  GSM114849                   control
    5  GSM114851                   control
    6  GSM114854                   control
    7  GSM114855                   control
    8  GSM114834 polycystic ovary syndrome
    9  GSM114842 polycystic ovary syndrome
    10 GSM114843 polycystic ovary syndrome
    11 GSM114847 polycystic ovary syndrome
    12 GSM114848 polycystic ovary syndrome
    13 GSM114850 polycystic ovary syndrome
    14 GSM114852 polycystic ovary syndrome
    15 GSM114853 polycystic ovary syndrome
    
    • information about source of samples
    Columns(gds)[,3]  
    
    [1] "Value for GSM114841: EP3_adipose_control; src: Omental adipose tissue"      
     [2] "Value for GSM114844: EP23_adipose_control; src: Omental adipose tissue"     
     [3] "Value for GSM114845: EP31_adipose_control_rep1; src: Omental adipose tissue"
     [4] "Value for GSM114849: EP37_adipose_control; src: Omental adipose tissue"     
     [5] "Value for GSM114851: EP49_adipose_control; src: Omental adipose tissue"     
     [6] "Value for GSM114854: EP69_adipose_control; src: Omental adipose tissue"     
     [7] "Value for GSM114855: EP71_adipose_control; src: Omental adipose tissue"     
     [8] "Value for GSM114834: EP1_adipose_pcos_rep1; src: Omental adipose tissue"    
     [9] "Value for GSM114842: EP10_adipose_pcos; src: Omental adipose tissue"        
    [10] "Value for GSM114843: EP18_adipose_pcos; src: Omental adipose tissue"        
    [11] "Value for GSM114847: EP32_adipose_pcos; src: Omental adipose tissue"        
    [12] "Value for GSM114848: EP34_adipose_pcos; src: Omental adipose tissue"        
    [13] "Value for GSM114850: EP47_adipose_pcos; src: Omental adipose tissue"        
    [14] "Value for GSM114852: EP55_adipose_pcos; src: Omental adipose tissue"        
    [15] "Value for GSM114853: EP66_adipose_pcos; src: Omental adipose tissue"
    
  • get information in more detail for the first sample (GSM114841)

    gsm <- getGEO("GSM114841")
    Meta(gsm)
    
    Using locally cached version of GSM114841 found here:
    /tmp/Rtmpa2mlCM/GSM114841.soft
    $biomaterial_provider_ch1
    [1] "Ramón y Cajal Hospital, Madrid, Spain"
    
    $channel_count
    [1] "1"
    
    $characteristics_ch1
    [1] "Morbidly obese control subject"
    
    $contact_address
    [1] "ARTURO DUPERIER"
    
    $contact_city
    [1] "MADRID"
    
    $contact_country
    [1] "Spain"
    
    $contact_email
    [1] "bperal@iib.uam.es"
    
    $contact_fax
    [1] "34 91 5854401"
    
    $contact_institute
    [1] "INSTITUTO DE INVESTIGACIONES BIOMEDICAS, CSIC-UAM"
    
    $contact_name
    [1] "BELEN,,PERAL"
    
    $contact_phone
    [1] "34 91 5854478"
    
    $contact_state
    [1] "MADRID"
    
    $`contact_zip/postal_code`
    [1] "28029"
    
    $data_processing
    [1] "MAS 5.0, scaled to 100 and RMA"
    
    $data_row_count
    [1] "22283"
    
    $description
    [1] "Total RNA was extracted from omental  adipose tissue from a control subject"
    
    $geo_accession
    [1] "GSM114841"
    
    $label_ch1
    [1] "Biotin"
    
    $last_update_date
    [1] "Jun 16 2006"
    
    $molecule_ch1
    [1] "total RNA"
    
    $organism_ch1
    [1] "Homo sapiens"
    
    $platform_id
    [1] "GPL96"
    
    $series_id
    [1] "GSE5090"
    
    $source_name_ch1
    [1] "Omental adipose tissue"
    
    $status
    [1] "Public on Jun 17 2006"
    
    $submission_date
    [1] "Jun 16 2006"
    
    $supplementary_file
    [1] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM114nnn/GSM114841/suppl/GSM114841.CEL.gz"
    [2] "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM114nnn/GSM114841/suppl/GSM114841.EXP.gz"
    
    $taxid_ch1
    [1] "9606"
    
    $title
    [1] "EP3_adipose_control"
    
    $type
    [1] "RNA"
    
    • look at the additional data
    Columns(gsm)
    
    ID_REF
    VALUESignal intensity - MAS 5.0, scaled to 100 and RMA
    ABS_CALLPresence/absence of gene transcript in sample; the call in an absolute analysis that indicates if the transcript was present (P), absent (A), marginal (M), or no call (NC)
    Detection p-valuep-value that indicates the significance level of the detection call
    • so there are four columns
      • probe ID
      • expression value (output from a Mas 5.0 analysis software)
      • Detection Call
      • p-val (Detection p-val)
    Table(gsm)[500:510,]
    
    Column
    1            ID_REF
    2             VALUE
    3          ABS_CALL
    4 Detection p-value
                                                                                                                                                                      Description
    1                                                                                                                                                                            
    2                                                                                                                           Signal intensity - MAS 5.0, scaled to 100 and RMA
    3 Presence/absence of gene transcript in sample; the call in an absolute analysis that indicates if the transcript was present (P), absent (A), marginal (M), or no call (NC)
    4                                                                                                         p-value that indicates the significance level of the detection call
            ID_REF VALUE ABS_CALL Detection p-value
    500   33307_at  47.8        P          0.039365
    501   33304_at  35.3        A          0.339558
    502   33197_at  58.4        P          0.017001
    503   33148_at  21.4        P          0.019304
    504   33132_at  15.8        A          0.189687
    505   32837_at   274        P          0.000959
    506   32836_at 319.8        P          0.006532
    507   32811_at 213.3        P          0.024711
    508   32723_at    35        P          0.004863
    509 32699_s_at  43.9        A          0.162935
    510   32625_at  67.1        A           0.11716
    
  • get a series record

    • contains lists for GPL platform object and the GSM sample record object
    • retrieve GSE data structure for our example in one go
    gse <- getGEO("GSE5090")
     gse
    
    Found 1 file(s)
    GSE5090_series_matrix.txt.gz
    Using locally cached version: /tmp/Rtmpa2mlCM/GSE5090_series_matrix.txt.gz
    Using locally cached version of GPL96 found here:
    /tmp/Rtmpa2mlCM/GPL96.soft
    $GSE5090_series_matrix.txt.gz
    ExpressionSet (storageMode: lockedEnvironment)
    assayData: 22283 features, 17 samples 
      element names: exprs 
    protocolData: none
    phenoData
      sampleNames: GSM114834 GSM114840 ... GSM114855 (17 total)
      varLabels: title geo_accession ... data_row_count (30 total)
      varMetadata: labelDescription
    featureData
      featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
      fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
      fvarMetadata: Column Description labelDescription
    experimentData: use 'experimentData(object)'
    Annotation: GPL96
    

Ubuntu - Samba file sharing



  • install samba
  • config file in: etc/samba (examples)
  • create the corresponding smb user: sudo smbpasswd -a user

Simple Example (German) here