Data frames in R are the mostly widely used object type in R data analysis, due to the fact that a data frame can afford to have different variables with different modes(numeric, character, Boolean, etc). The data sources that are adaptable to create a data frame in R are versatile. Excel files are such kinds of data sources that appear as tabular, similar as data frames in R. Reading excel files into R session is easy going with read_xlsx() function.
Say we have an excel file located in the current working directory in R session, and the file contains several work sheets. The following screen shows the first 5 rows of the first sheet.
mpg cyl hp wt vs am
21 6 110 2.62 0 1
21 6 110 2.875 0 1
22.8 4 93 2.32 1 1
21.4 6 110 3.215 1 0
Then we can use read_xlsx() to read the first and second sheet into R.
#load library first
library(readxl)
workbook <- "D:\\RStatistics-Tutorial\\cars.xlsx"
#by default first line in excel is read as column variables in dataframe
mydataframe1 <- read_xlsx(workbook, 1)
mydataframe2 <- read_xlsx(workbook, 2)
#to show data frames
head(mydataframe1)
#output
# A tibble: 6 × 6
mpg cyl hp wt vs am
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 110 2.62 0 1
2 21 6 110 2.88 0 1
3 22.8 4 93 2.32 1 1
4 21.4 6 110 3.22 1 0
5 18.7 8 175 999 0 0
6 18.1 6 105 3.46 1 0
head(mydataframe2)
#output
# A tibble: 6 × 6
mpg cyl hp wt vs am
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 110 2.62 0 1
2 21 6 110 2.88 0 1
3 22.8 4 93 2.32 1 1
4 21.4 6 110 3.22 1 0
5 18.7 8 175 999 0 0
6 18.1 6 105 3.46 1 0
The alternative to use work sheet number in the function is using the name of the work sheet. In the following example, we specify the work sheet name ‘car’ when reading the file.
#reading working sheet 'car' into data frame
mydataframe1 <- read_xlsx(workbook, "cars")
head(mydataframe1)
#output
# A tibble: 6 × 6
mpg cyl hp wt vs am
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 110 2.62 0 1
2 21 6 110 2.88 0 1
3 22.8 4 93 2.32 1 1
4 21.4 6 110 3.22 1 0
5 18.7 8 175 999 0 0
6 18.1 6 105 3.46 1 0
Specific values can also be identified with missing value using option ‘na’ in the function.
#value 0 replaced with na
mydataframe1 <- read_xlsx(workbook, sheet = 1, na="0")
head(mydataframe1)
#output, now the data frame contains multiple NA for missing values
# A tibble: 6 × 6
mpg cyl hp wt vs am
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 110 2.62 NA 1
2 21 6 110 2.88 NA 1
3 22.8 4 93 2.32 1 1
4 21.4 6 110 3.22 1 NA
5 18.7 8 175 999 NA NA
6 18.1 6 105 3.46 1 NA
Instead of reading the whole work sheet, read_xlsx() function provides also the availability to read subset of the fields into a data frame.
#Choose cells A1:C9 in worksheet to read from
mydataframe1 <- read_xlsx(workbook, sheet = 1, range="A1:C9")
head(mydataframe1)
#output
# A tibble: 6 × 3
mpg cyl hp
<dbl> <dbl> <dbl>
1 21 6 110
2 21 6 110
3 22.8 4 93
4 21.4 6 110
5 18.7 8 175
6 18.1 6 105
For getting more knowledge of R, you can watch R tutorial videos on our YouTube channel !
0 Comments