Wednesday, June 8, 2011

R - import excel file

  • the command read.xls("Path/to/data/data.xls") is part of the gdata package, so you have to install and load it
  • str() shows the structure of the read data
Example:
library(ggplot2)
> x<-read.xls("c:\\R\\20110511daten.xls") # on Win
> x<-read.xls("~/R/20110511daten.xls") # on Unix
> str(x)
'data.frame':   7554 obs. of  20 variables:
$ sex : int  1 1 1 1 1 0 0 0 0 0 ...
$ date : Factor w/ 4312 levels "1966-Mar-01",..: 609 655 704 175 1920 2033 217 1899 977 ...
$ month : int  3 12 7 12 11 6 11 11 6 1 ...
$ year : int  2011 2008 2007 2007 2010 2008 2010 2007 2010 2010 ...
$ age : num  17.5 15.9 14 17.9 12.6 ...
everything is fine except the second variable: date is read as a factor not as date time; it has to be converted into a date.
  • above you can see date is stored as "year-month-day"
  • so all we have to do is the following:
  • library(ggplot2)> x$date<-as.Date(x$date, format="%Y-%b-%d")
    > str(x)
    
    'data.frame':   7554 obs. of  21 variables:
    $ sex         : int  1 1 1 1 1 0 0 0 0 0 ...
    $ date        :Class 'Date'  num [1:7554] 8629 8407 8572 7315 10342 ...
    $ month       : int  3 12 7 12 11 6 11 11 6 1 ...
    $ year        : int  2011 2008 2007 2007 2010 2008 2010 2007 2010 2010 ...
    
  • now date has the class Date and if you run summary on it you get the right informations:
    > summary(x$date)
    Min.1st Qu.MedianMean3rd Qu.Max.
    "1975-11-06" "1995-08-13" "1999-02-09""1998-12-31""2002-09-10""2008-02-17"
  • further informations about the format argument you find here.

No comments :

Post a Comment