Saturday, April 25, 2015

R - read excel files without java or perl dependencies

1 read Excel files without java or perl dependencies

With the new package from hadley wickham it is finally possible to read excel files without java or excel dependencies, yeah… For now there are only two functions:

  • excel_sheets() to get all sheets contained in an excel file
  • read_excel() to read one sheet out of an excel file
  • get the filenames of excel files contained in the current directory

require(readxl)
dir(pattern = "xls")

[1] "20140927excelfile.xlsx" "example.xlsx"

  • get the names of the excel sheets contained in the excel file (only one in this case)
  • the function has the file name resp. the whole path as argument


excel_sheets("example.xlsx")

[1] "conspvalues"

  • read in the sheet, the function takes the path and the sheet as argument
  • the sheet can be given as position (first sheet = 1, second = 2, etc) or as name (the first sheet is the default)


xx <- read_excel("example.xlsx",1)
head(xx)

mpg cyl    disp  hp drat      wt  qsec vs am gear carb
1 21.0   6     160 110 3.90    2.62 16.46  0  1    4    4
2 21.0   6 missing 110 3.90   2.875 17.02  0  1    4    4
3 22.8   4     108  93 3.85    2.32 18.61  1  1    4    1
4 21.4   6     258 110 3.08 missing 19.44  1  0    3    1
5 18.7   8     360 175 3.15    3.44 17.02  0  0    3    2
6 18.1   6     225 105 2.76    3.46 20.22  1  0    3    1


xx <- read_excel("example.xlsx","conspvalues")
head(xx)

mpg cyl    disp  hp drat      wt  qsec vs am gear carb
1 21.0   6     160 110 3.90    2.62 16.46  0  1    4    4
2 21.0   6 missing 110 3.90   2.875 17.02  0  1    4    4
3 22.8   4     108  93 3.85    2.32 18.61  1  1    4    1
4 21.4   6     258 110 3.08 missing 19.44  1  0    3    1
5 18.7   8     360 175 3.15    3.44 17.02  0  0    3    2
6 18.1   6     225 105 2.76    3.46 20.22  1  0    3    1

  • there is an additional argument to set na strings (na)

xx <- read_excel("example.xlsx","conspvalues",na="missing")
head(xx)

mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6   NA 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6  258 110 3.08    NA 19.44  1  0    3    1
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

  • with skip you can skip the first lines


xx <- read_excel("example.xlsx","conspvalues",skip=1)
head(xx)

21 6     160 110  3.9    2.62 16.46 0 1 4 4
1 21.0 6 missing 110 3.90   2.875 17.02 0 1 4 4
2 22.8 4     108  93 3.85    2.32 18.61 1 1 4 1
3 21.4 6     258 110 3.08 missing 19.44 1 0 3 1
4 18.7 8     360 175 3.15    3.44 17.02 0 0 3 2
5 18.1 6     225 105 2.76    3.46 20.22 1 0 3 1
6 14.3 8     360 245 3.21    3.57 15.84 0 0 3 4

  • and you can use col_names to tell the function whether or not the excel file containes the column names


xx <- read_excel("example.xlsx","conspvalues",skip=1, col_names=F)
head(xx)

X0 X1      X2  X3   X4      X5    X6 X7 X8 X9 X10
1 21.0  6     160 110 3.90    2.62 16.46  0  1  4   4
2 21.0  6 missing 110 3.90   2.875 17.02  0  1  4   4
3 22.8  4     108  93 3.85    2.32 18.61  1  1  4   1
4 21.4  6     258 110 3.08 missing 19.44  1  0  3   1
5 18.7  8     360 175 3.15    3.44 17.02  0  0  3   2
6 18.1  6     225 105 2.76    3.46 20.22  1  0  3   1


  • using col_types you can define the types of your columns (instead of letting R guess), attention: NA values may be produced
  • you need one name and type for each column


xx <- read_excel("example.xlsx","conspvalues",skip=1, col_types = rep("numeric",11))
head(xx)

Warnmeldungen:
1: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types,  :
  [3, 3]: expecting numeric: got 'missing'
2: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types,  :
  [5, 6]: expecting numeric: got 'missing'
    21 6 160 110  3.9  2.62 16.46 0 1 4 4
1 21.0 6  NA 110 3.90 2.875 17.02 0 1 4 4
2 22.8 4 108  93 3.85 2.320 18.61 1 1 4 1
3 21.4 6 258 110 3.08    NA 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4