We provide online training for R and Python, click here for more info !
When a data frame is created in R, sometimes the data frame contains dozens of variables and only a subset of them will be used in data analysis. Thus, selecting these variables and saving them into a new object will make data management clear and concise.
Say, we have a data frame about student testing score, ‘grade’ on hand. And now, only the name of the students variables for ‘Math’, ‘Physics’ and ‘Chemistry’ are needed in the next step task. So, the easiest way to select those variables is using the index number of them in the square brackets.
#show first observations of the data frame
head(grade)
#output
StudentID First Last Gender Country Age Math Physics
1 1 James Zhang Male US 23 73 70
2 2 Wilson Li Male UK 26 95 999
3 3 Richard Nuan Ye Male UK 35 77 83
4 4 Mary Deng Female US 21 60 99
5 5 Jason Wilson Male UK 19 77 89
6 6 Jennifer Hopkin Female UK 43 79 64
Chemistry Date
1 87 10/31/08
2 83 03/16/08
3 92 05/22/08
4 84 01/24/09
5 93 07/30/09
6 83 04/05/09
#select wanted variables, using indexing with the column numbers
new_df1 <- grade[c(2,3,7,8,9)]
#show fist observations of the new data frame
head(new_df1)
#output
First Last Math Physics Chemistry
1 James Zhang 73 70 87
2 Wilson Li 95 999 83
3 Richard Nuan Ye 77 83 92
4 Mary Deng 60 99 84
5 Jason Wilson 77 89 93
6 Jennifer Hopkin 79 64 83
The second way of choosing variables is filling the variable names into the square brackets.
#select variables, using column names
new_df2 <- grade[c("First","Last","Math","Physics","Chemistry")]
#show first observations of the new data frame
head(new_df2)
#output
First Last Math Physics Chemistry
1 James Zhang 73 70 87
2 Wilson Li 95 999 83
3 Richard Nuan Ye 77 83 92
4 Mary Deng 60 99 84
5 Jason Wilson 77 89 93
6 Jennifer Hopkin 79 64 83
The third method is just the opposite of the first one, namely excluding the unwanted variables by adding minus symbol in front of their indexing numbers.
#select variables, by excluding unwanted variables
new_df3 <- grade[c(-1,-4,-5,-6,-10)]
#show first observations of the new data frame
head(new_df3)
#output
First Last Math Physics Chemistry
1 James Zhang 73 70 87
2 Wilson Li 95 999 83
3 Richard Nuan Ye 77 83 92
4 Mary Deng 60 99 84
5 Jason Wilson 77 89 93
6 Jennifer Hopkin 79 64 83
The fourth method is using select() function from dplyr package.
#using select() of dplyr to select variables
library(dplyr)
new_df4 <- select(grade,First,Last,Math,Physics,Chemistry)
#show first observations of the new data frame
head(new_df4)
#output
First Last Math Physics Chemistry
1 James Zhang 73 70 87
2 Wilson Li 95 999 83
3 Richard Nuan Ye 77 83 92
4 Mary Deng 60 99 84
5 Jason Wilson 77 89 93
6 Jennifer Hopkin 79 64 83
0 Comments