With the new package from hadley wickham it is finally possible to read excel files without java or excel dependencies, yeah… For now there are only two functions:
- excel_sheets() to get all sheets contained in an excel file
- read_excel() to read one sheet out of an excel file
- get the filenames of excel files contained in the current directory
require(readxl)
dir(pattern = "xls")
[1] "20140927excelfile.xlsx" "example.xlsx"
- get the names of the excel sheets contained in the excel file (only one in this case)
- the function has the file name resp. the whole path as argument
excel_sheets("example.xlsx")
[1] "conspvalues"
- read in the sheet, the function takes the path and the sheet as argument
- the sheet can be given as position (first sheet = 1, second = 2, etc) or as name (the first sheet is the default)
xx <- read_excel("example.xlsx",1)
head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
xx <- read_excel("example.xlsx","conspvalues")
head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
- there is an additional argument to set na strings (na)
xx <- read_excel("example.xlsx","conspvalues",na="missing")
head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 NA 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
- with skip you can skip the first lines
xx <- read_excel("example.xlsx","conspvalues",skip=1)
head(xx)
21 6 160 110 3.9 2.62 16.46 0 1 4 4
1 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4
2 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
3 21.4 6 258 110 3.08 missing 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
- and you can use col_names to tell the function whether or not the excel file containes the column names
xx <- read_excel("example.xlsx","conspvalues",skip=1, col_names=F)
head(xx)
X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
- using col_types you can define the types of your columns (instead of letting R guess), attention: NA values may be produced
- you need one name and type for each column
xx <- read_excel("example.xlsx","conspvalues",skip=1, col_types = rep("numeric",11))
head(xx)
Warnmeldungen:
1: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[3, 3]: expecting numeric: got 'missing'
2: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[5, 6]: expecting numeric: got 'missing'
21 6 160 110 3.9 2.62 16.46 0 1 4 4
1 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4
2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
3 21.4 6 258 110 3.08 NA 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4