How to read excel spreadsheet into a data frame in R

We provide effective and economically affordable training courses for R and Python, click here for more details and course registration !

Data frames in R are the mostly widely used object type in R data analysis, due to the fact that a data frame can afford to have different variables with different modes(numeric, character, Boolean, etc). The data sources that are adaptable to create a data frame in R are versatile. Excel files are such kinds of data sources that appear as tabular, similar as data frames in R. Reading excel files into R session is easy going with read_xlsx() function.

Say we have an excel file located in the current working directory in R session, and the file contains several work sheets. The following screen shows the first 5 rows of the first sheet.

mpg	cyl	hp	wt	vs	am
21	6	110	2.62	0	1
21	6	110	2.875	0	1
22.8	4	93	2.32	1	1
21.4	6	110	3.215	1	0

Then we can use read_xlsx() to read the first and second sheet into R.

#load library first
library(readxl)
workbook <- "D:\\RStatistics-Tutorial\\cars.xlsx"
#by default first line in excel is read as column variables in dataframe
mydataframe1 <- read_xlsx(workbook, 1)
mydataframe2 <- read_xlsx(workbook, 2)
#to show data frames
head(mydataframe1)
#output
# A tibble: 6 × 6
    mpg   cyl    hp     wt    vs    am
  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1  21       6   110   2.62     0     1
2  21       6   110   2.88     0     1
3  22.8     4    93   2.32     1     1
4  21.4     6   110   3.22     1     0
5  18.7     8   175 999        0     0
6  18.1     6   105   3.46     1     0
head(mydataframe2)
#output
# A tibble: 6 × 6
    mpg   cyl    hp     wt    vs    am
  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1  21       6   110   2.62     0     1
2  21       6   110   2.88     0     1
3  22.8     4    93   2.32     1     1
4  21.4     6   110   3.22     1     0
5  18.7     8   175 999        0     0
6  18.1     6   105   3.46     1     0

The alternative to use work sheet number in the function is using the name of the work sheet. In the following example, we specify the work sheet name ‘car’ when reading the file.

#reading working sheet 'car' into data frame
mydataframe1 <- read_xlsx(workbook, "cars")
head(mydataframe1)
#output
# A tibble: 6 × 6
    mpg   cyl    hp     wt    vs    am
  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1  21       6   110   2.62     0     1
2  21       6   110   2.88     0     1
3  22.8     4    93   2.32     1     1
4  21.4     6   110   3.22     1     0
5  18.7     8   175 999        0     0
6  18.1     6   105   3.46     1     0

Specific values can also be identified with missing value using option ‘na’ in the function.

#value 0 replaced with na
mydataframe1 <- read_xlsx(workbook, sheet = 1, na="0")
head(mydataframe1)
#output, now the data frame contains multiple NA for missing values
# A tibble: 6 × 6
    mpg   cyl    hp     wt    vs    am
  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1  21       6   110   2.62    NA     1
2  21       6   110   2.88    NA     1
3  22.8     4    93   2.32     1     1
4  21.4     6   110   3.22     1    NA
5  18.7     8   175 999       NA    NA
6  18.1     6   105   3.46     1    NA

Instead of reading the whole work sheet, read_xlsx() function provides also the availability to read subset of the fields into a data frame.

#Choose cells A1:C9 in worksheet to read from
mydataframe1 <- read_xlsx(workbook, sheet = 1, range="A1:C9")
head(mydataframe1)
#output
# A tibble: 6 × 3
    mpg   cyl    hp
  <dbl> <dbl> <dbl>
1  21       6   110
2  21       6   110
3  22.8     4    93
4  21.4     6   110
5  18.7     8   175
6  18.1     6   105

For getting more knowledge of R, you can watch R tutorial videos on our YouTube channel !

Published by wilsonzhang746 on June 10, 2024June 10, 2024

0 Comments

Leave a Reply Cancel reply

Download R Course source files

How to delete columns of a data frame in Python

Mathematical operations between Pandas Series in Python