1 read Excel files without java or perl dependencies
With the new package from hadley wickham it is finally possible to read excel files without java or excel dependencies, yeah… For now there are only two functions:
- excel_sheets() to get all sheets contained in an excel file
- read_excel() to read one sheet out of an excel file
- get the filenames of excel files contained in the current directory
require(readxl) dir(pattern = "xls")
[1] "20140927excelfile.xlsx" "example.xlsx"
- get the names of the excel sheets contained in the excel file (only one in this case)
- the function has the file name resp. the whole path as argument
excel_sheets("example.xlsx")
[1] "conspvalues"
- read in the sheet, the function takes the path and the sheet as argument
- the sheet can be given as position (first sheet = 1, second = 2, etc) or as name (the first sheet is the default)
xx <- read_excel("example.xlsx",1) head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4 2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
xx <- read_excel("example.xlsx","conspvalues") head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4 2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
- there is an additional argument to set na strings (na)
xx <- read_excel("example.xlsx","conspvalues",na="missing") head(xx)
mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258 110 3.08 NA 19.44 1 0 3 1 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
- with skip you can skip the first lines
xx <- read_excel("example.xlsx","conspvalues",skip=1) head(xx)
21 6 160 110 3.9 2.62 16.46 0 1 4 4 1 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4 2 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 3 21.4 6 258 110 3.08 missing 19.44 1 0 3 1 4 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 5 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 6 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
- and you can use col_names to tell the function whether or not the excel file containes the column names
xx <- read_excel("example.xlsx","conspvalues",skip=1, col_names=F) head(xx)
X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4 2 21.0 6 missing 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 4 21.4 6 258 110 3.08 missing 19.44 1 0 3 1 5 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 6 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
- using col_types you can define the types of your columns (instead of letting R guess), attention: NA values may be produced
- you need one name and type for each column
xx <- read_excel("example.xlsx","conspvalues",skip=1, col_types = rep("numeric",11)) head(xx)
Warnmeldungen: 1: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, : [3, 3]: expecting numeric: got 'missing' 2: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, : [5, 6]: expecting numeric: got 'missing' 21 6 160 110 3.9 2.62 16.46 0 1 4 4 1 21.0 6 NA 110 3.90 2.875 17.02 0 1 4 4 2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3 21.4 6 258 110 3.08 NA 19.44 1 0 3 1 4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4