How to select variables in data frame with R

We provide online training for R and Python, click here for more info !

When a data frame is created in R, sometimes the data frame contains dozens of variables and only a subset of them will be used in data analysis. Thus, selecting these variables and saving them into a new object will make data management clear and concise.

Say, we have a data frame about student testing score, ‘grade’ on hand. And now, only the name of the students variables for ‘Math’, ‘Physics’ and ‘Chemistry’ are needed in the next step task. So, the easiest way to select those variables is using the index number of them in the square brackets.

#show first observations of the data frame
head(grade)
#output
  StudentID    First    Last Gender Country Age Math Physics
1         1    James   Zhang   Male      US  23   73      70
2         2   Wilson      Li   Male      UK  26   95     999
3         3  Richard Nuan Ye   Male      UK  35   77      83
4         4     Mary    Deng Female      US  21   60      99
5         5    Jason  Wilson   Male      UK  19   77      89
6         6 Jennifer  Hopkin Female      UK  43   79      64
  Chemistry     Date
1        87 10/31/08
2        83 03/16/08
3        92 05/22/08
4        84 01/24/09
5        93 07/30/09
6        83 04/05/09
#select wanted variables, using indexing with the column numbers
new_df1 <- grade[c(2,3,7,8,9)]
#show fist observations of the new data frame
head(new_df1)
#output 
    First    Last Math Physics Chemistry
1    James   Zhang   73      70        87
2   Wilson      Li   95     999        83
3  Richard Nuan Ye   77      83        92
4     Mary    Deng   60      99        84
5    Jason  Wilson   77      89        93
6 Jennifer  Hopkin   79      64        83

The second way of choosing variables is filling the variable names into the square brackets.

#select variables, using column names
new_df2 <- grade[c("First","Last","Math","Physics","Chemistry")]
#show first observations of the new data frame
head(new_df2)
#output
    First    Last Math Physics Chemistry
1    James   Zhang   73      70        87
2   Wilson      Li   95     999        83
3  Richard Nuan Ye   77      83        92
4     Mary    Deng   60      99        84
5    Jason  Wilson   77      89        93
6 Jennifer  Hopkin   79      64        83

The third method is just the opposite of the first one, namely excluding the unwanted variables by adding minus symbol in front of their indexing numbers.

#select variables, by excluding unwanted variables
new_df3 <- grade[c(-1,-4,-5,-6,-10)]
#show first observations of the new data frame
head(new_df3)
#output
   First    Last Math Physics Chemistry
1    James   Zhang   73      70        87
2   Wilson      Li   95     999        83
3  Richard Nuan Ye   77      83        92
4     Mary    Deng   60      99        84
5    Jason  Wilson   77      89        93
6 Jennifer  Hopkin   79      64        83

The fourth method is using select() function from dplyr package.

#using select() of dplyr to select variables
library(dplyr)
new_df4 <- select(grade,First,Last,Math,Physics,Chemistry)
#show first observations of the new data frame
head(new_df4)
#output
     First    Last Math Physics Chemistry
1    James   Zhang   73      70        87
2   Wilson      Li   95     999        83
3  Richard Nuan Ye   77      83        92
4     Mary    Deng   60      99        84
5    Jason  Wilson   77      89        93
6 Jennifer  Hopkin   79      64        83

You can see each method will return the same result. The methods we mention here are not exhaustive, and you can learn more R functions from our YouTube channel.

Published by wilsonzhang746 on June 14, 2024June 14, 2024

0 Comments

Leave a Reply Cancel reply

Download R Course source files

How to delete columns of a data frame in Python

Mathematical operations between Pandas Series in Python