rdatacode.com provides online training course for R and Python, click here for more info !

When a data frame is applied in data analysis in R, very often some specific rows or observations may be selected. The simplest way to select observations is filling row indices in square brackets.

#show the full data frame
head(grade)
#select first 5 rows of data frame, and save to a new object
testdf <- grade[1:5,]
testdf
#output
 StudentID   First    Last Gender Country Age Math Physics
1         1   James   Zhang   Male      US  23   73      70
2         2  Wilson      Li   Male      UK  26   95     999
3         3 Richard Nuan Ye   Male      UK  35   77      83
4         4    Mary    Deng Female      US  21   60      99
5         5   Jason  Wilson   Male      UK  19   77      89
  Chemistry     Date
1        87 10/31/08
2        83 03/16/08
3        92 05/22/08
4        84 01/24/09
5        93 07/30/09

Conditional test can be included into row selection. In the following example, Male students over 23 years old are selected.

testdf <- grade[(grade$Gender == "Male") & (grade$Age >=23) ,]
testdf
#output
  StudentID    First    Last Gender Country Age Math Physics
1          1    James   Zhang   Male      US  23   73      70
2          2   Wilson      Li   Male      UK  26   95     999
3          3  Richard Nuan Ye   Male      UK  35   77      83
10        10  Steinar  Hansen   Male      UK  25   66      93
11        11  Michael    Chen   Male      UK  42   83      90
12        12    Josef  Curton   Male      US  32   71      63
13        13 Jennifer   Jones   Male      US  27   79      76
20        20   Martin   Jones   Male      US  25   82      73
   Chemistry     Date
1         87 10/31/08
2         83 03/16/08
3         92 05/22/08
10       999 08/01/08
11        77 10/24/08
12        96 11/08/09
13        82 10/29/08
20        62      999
> 

Date variable can be transformed to data format, then observations with testing date between a specified range can be selected.

#transform Date variable into date format
grade$Date <- as.Date(grade$Date, "%m/%d/%y") 
#create starting and ending date
date1 <- as.Date("2008-06-01") 
date2 <- as.Date("2008-12-31") 
#observations with testing date in the range are selected
testdf <- grade[which(grade$Date >= date1 & 
                            grade$Date <= date2),]

testdf
#output 
  StudentID    First    Last Gender Country Age Math Physics
1          1    James   Zhang   Male      US  23   73      70
7          7     Kari Gjendem Female      US  37   87      99
8          8   Wenche    Dale Female      US  28   95      87
10        10  Steinar  Hansen   Male      UK  25   66      93
11        11  Michael    Chen   Male      UK  42   83      90
13        13 Jennifer   Jones   Male      US  27   79      76
14        14     Gary   Grant Female      UK  35   90      78
15        15     Phil     Yao   Male      UK  21   69      69
   Chemistry       Date
1         87 2008-10-31
7         67 2008-11-24
8         93 2008-10-02
10       999 2008-08-01
11        77 2008-10-24
13        82 2008-10-29
14        92 2008-10-24
15        83 2008-10-15

You can preview and you learn more R functions from our YouTube channel.


0 Comments

Leave a Reply

Avatar placeholder