Use spread() of dplyr in R to convert long-form dataset into wide-form

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

dplyr is a package that belongs to tidyverse framework. dplyr allows usage of pipeline structure (%>%), which can chain multiple functions together into one statement to make data management more effective. spread() of dplyr is a function that spreads the values of columns from the current data frame, and make them as column labels in the resulting data frame. This operation is also called the transformation from long-form dataset into wide-form dataset. Operation with spread() is the opposite operation of gather() function in dplyr, which converts a wide-form dataset into a long-form.

In the following example, a data frame df_1 of three columns is created first. Then the df_1 is passed to the filter() function from dplyr first to exclude missing values (NA). Next column ‘Type’ is spread such that its values become column labels in the resulting data frame df_2.

#load library tidyverse
#you can also load library dplyr if you have installed
> library(tidyverse)

#create data frame df_1
> df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"), Answer = c("Yes", "No", NA, "No"), n = 1:4)

#show data frame
> df_1 
# A tibble: 4 × 3
  Type  Answer     n
  <chr> <chr>  <int>
1 TypeA Yes        1
2 TypeA No         2
3 TypeB NA         3
4 TypeB No         4
> 

#filter data frame first to exclude missing value NA
#using spread, “No” and “Yes” in answers are in separate columns in resulting data frame df_2
> df_2 <- df_1 %>%
   filter(!is.na(Answer)) %>%
   spread(key=Answer, value=n)

#show data frame df_2 
> df_2
# A tibble: 2 × 3
  Type     No   Yes
  <chr> <int> <int>
1 TypeA     2     1
2 TypeB     4    NA
>

In the following example, a data frame df2 of four columns is created first. Then the values of variable ‘stat’ are spread out, and become column labels in the resulting data frame.

#to create data frame df2
df2 <- data.frame(player=rep(c('A'), times=8),
                   year=rep(c(1, 2), each=4),
                   stat=rep(c('points', 'assists', 'steals',  
                    'blocks'), times=2),
                   amount=c(14, 6, 2, 1, 29, 9, 3, 4))
> 
#view data frame
> df2
  player year    stat amount
1      A    1  points     14
2      A    1 assists      6
3      A    1  steals      2
4      A    1  blocks      1
5      A    2  points     29
6      A    2 assists      9
7      A    2  steals      3
8      A    2  blocks      4
> 
#use the spread() function to turn the four unique values 
#in the stat column into four new columns:
> spread(df2, key=stat, value=amount)
  player year assists blocks points steals
1      A    1       6      1     14      2
2      A    2       9      4     29      3
>

You can also watch full video of R basic course from our YouTube channel.

Published by wilsonzhang746 on February 15, 2024February 15, 2024

0 Comments

Leave a Reply Cancel reply

Download R Course source files

How to delete columns of a data frame in Python

Mathematical operations between Pandas Series in Python