R Programming

How to sort datasets in R

We provide effective and economically affordable training courses for R and Python, click here for more details and course registration !

Data frame is the most widely used object type in R data analysis, because it allows storing different modes of data in a tabular form. The rows of a data frame represents each observation, and the columns denotes different variables each observation has. When the data is collected and read into a data frame, the order of the observations may not meet the needs of the analyst. So it often needs to sort the data frame with respect the to the value(s) of one or several variables. In R, function order() can be easily implemented to sort a data frame. In the following code example, the data frame ‘grade’ is sorted by age of student age from youngest to oldest.

#set working directory
setwd("d:\\RStatistics-Tutorial")   
#read csv file into a data frame
vartype<-c("character", "character", "character", "character", "character", "numeric","numeric", "numeric","numeric","character")
grade <- read.table("University-NA.csv", colClasses=vartype, header=TRUE, sep=",")                                      
#show data frame
grade
#output
  StudentID    First     Last Gender  Country Age Math Physics
1          1    James    Zhang   Male      US  23   73      70
2          2   Wilson       Li   Male      UK  26   95     999
3          3  Richard  Nuan Ye   Male      UK  35   77      83
4          4     Mary     Deng Female      US  21   60      99
5          5    Jason   Wilson   Male      UK  19   77      89
6          6 Jennifer   Hopkin Female      UK  43   79      64
7          7     Kari  Gjendem Female      US  37   87      99
8          8   Wenche     Dale Female      US  28   95      87
9          9     Jane   Larsen Female      US  19   73      92
10        10  Steinar   Hansen   Male      UK  25   66      93
11        11  Michael     Chen   Male      UK  42   83      90
12        12    Josef   Curton   Male      US  32   71      63
13        13 Jennifer    Jones   Male      US  27   79      76
14        14     Gary    Grant Female      UK  35   90      78
15        15     Phil      Yao   Male      UK  21   69      69
16        16     Nora   Spears Female      US  29   79      83
17        17    Goril Nordmann Female      UK  36   91      79
18        18     Lisa  Bondvik Female      US  39   65      73
19        19     Guri    Olsen Female      US  24   87      72
20        20   Martin    Jones   Male      US  25   82      73
   Chemistry     Date
1         87 10/31/08
2         83 03/16/08
3         92 05/22/08
4         84 01/24/09
5         93 07/30/09
6         83 04/05/09
7         67 11/24/08
8         93 10/02/08
9         84 06/05/09
10       999 08/01/08
11        77 10/24/08
12        96 11/08/09
13        82 10/29/08
14        92 10/24/08
15        83 10/15/08
16        76 03/11/09
17        69 05/24/08
18        87 07/09/09
19        89 08/12/09
20        62      999

#sorted grade from youngest student to oldest student
test <- grade[order(grade$Age),]
#show data frame
test
#output 
   StudentID    First     Last Gender Country Age Math Physics
5          5    Jason   Wilson   Male      UK  19   77      89
9          9     Jane   Larsen Female      US  19   73      92
4          4     Mary     Deng Female      US  21   60      99
15        15     Phil      Yao   Male      UK  21   69      69
1          1    James    Zhang   Male      US  23   73      70
19        19     Guri    Olsen Female      US  24   87      72
10        10  Steinar   Hansen   Male      UK  25   66      93
20        20   Martin    Jones   Male      US  25   82      73
2          2   Wilson       Li   Male      UK  26   95     999
13        13 Jennifer    Jones   Male      US  27   79      76
8          8   Wenche     Dale Female      US  28   95      87
16        16     Nora   Spears Female      US  29   79      83
12        12    Josef   Curton   Male      US  32   71      63
3          3  Richard  Nuan Ye   Male      UK  35   77      83
14        14     Gary    Grant Female      UK  35   90      78
17        17    Goril Nordmann Female      UK  36   91      79
7          7     Kari  Gjendem Female      US  37   87      99
18        18     Lisa  Bondvik Female      US  39   65      73
11        11  Michael     Chen   Male      UK  42   83      90
6          6 Jennifer   Hopkin Female      UK  43   79      64
   Chemistry     Date
5         93 07/30/09
9         84 06/05/09
4         84 01/24/09
15        83 10/15/08
1         87 10/31/08
19        89 08/12/09
10       999 08/01/08
20        62      999
2         83 03/16/08
13        82 10/29/08
8         93 10/02/08
16        76 03/11/09
12        96 11/08/09
3         92 05/22/08
14        92 10/24/08
17        69 05/24/08
7         67 11/24/08
18        87 07/09/09
11        77 10/24/08
6         83 04/05/09
> 

The sorting can be based on more than one variable. The following example shows observations in the data frame are sorted by gender first, then within each gender, sorted by age.

#sorts by Gender first, then by Age within each gender
test <- grade[order(grade$Gender, grade$Age),]
#show data frame
test
#output
  StudentID    First     Last Gender   Country Age Math Physics
9          9     Jane   Larsen Female      US  19   73      92
4          4     Mary     Deng Female      US  21   60      99
19        19     Guri    Olsen Female      US  24   87      72
8          8   Wenche     Dale Female      US  28   95      87
16        16     Nora   Spears Female      US  29   79      83
14        14     Gary    Grant Female      UK  35   90      78
17        17    Goril Nordmann Female      UK  36   91      79
7          7     Kari  Gjendem Female      US  37   87      99
18        18     Lisa  Bondvik Female      US  39   65      73
6          6 Jennifer   Hopkin Female      UK  43   79      64
5          5    Jason   Wilson   Male      UK  19   77      89
15        15     Phil      Yao   Male      UK  21   69      69
1          1    James    Zhang   Male      US  23   73      70
10        10  Steinar   Hansen   Male      UK  25   66      93
20        20   Martin    Jones   Male      US  25   82      73
2          2   Wilson       Li   Male      UK  26   95     999
13        13 Jennifer    Jones   Male      US  27   79      76
12        12    Josef   Curton   Male      US  32   71      63
3          3  Richard  Nuan Ye   Male      UK  35   77      83
11        11  Michael     Chen   Male      UK  42   83      90
   Chemistry     Date
9         84 06/05/09
4         84 01/24/09
19        89 08/12/09
8         93 10/02/08
16        76 03/11/09
14        92 10/24/08
17        69 05/24/08
7         67 11/24/08
18        87 07/09/09
6         83 04/05/09
5         93 07/30/09
15        83 10/15/08
1         87 10/31/08
10       999 08/01/08
20        62      999
2         83 03/16/08
13        82 10/29/08
12        96 11/08/09
3         92 05/22/08
11        77 10/24/08
> 

Sorting of numerical variables can also with the largest value to the smallest. For example, the next example shows the sorting of data frame by gender first, then from the oldest student to the youngest student with each gender.

#sorts by Gender first, then by Age with each gender
#from old to young student
test <- grade[order(grade$Gender, -grade$Age),]
#show data frame
test
#output
StudentID    First     Last   Gender   Country Age Math Physics
6          6 Jennifer   Hopkin Female      UK  43   79      64
18        18     Lisa  Bondvik Female      US  39   65      73
7          7     Kari  Gjendem Female      US  37   87      99
17        17    Goril Nordmann Female      UK  36   91      79
14        14     Gary    Grant Female      UK  35   90      78
16        16     Nora   Spears Female      US  29   79      83
8          8   Wenche     Dale Female      US  28   95      87
19        19     Guri    Olsen Female      US  24   87      72
4          4     Mary     Deng Female      US  21   60      99
9          9     Jane   Larsen Female      US  19   73      92
11        11  Michael     Chen   Male      UK  42   83      90
3          3  Richard  Nuan Ye   Male      UK  35   77      83
12        12    Josef   Curton   Male      US  32   71      63
13        13 Jennifer    Jones   Male      US  27   79      76
2          2   Wilson       Li   Male      UK  26   95     999
10        10  Steinar   Hansen   Male      UK  25   66      93
20        20   Martin    Jones   Male      US  25   82      73
1          1    James    Zhang   Male      US  23   73      70
15        15     Phil      Yao   Male      UK  21   69      69
5          5    Jason   Wilson   Male      UK  19   77      89
   Chemistry     Date
6         83 04/05/09
18        87 07/09/09
7         67 11/24/08
17        69 05/24/08
14        92 10/24/08
16        76 03/11/09
8         93 10/02/08
19        89 08/12/09
4         84 01/24/09
9         84 06/05/09
11        77 10/24/08
3         92 05/22/08
12        96 11/08/09
13        82 10/29/08
2         83 03/16/08
10       999 08/01/08
20        62      999
1         87 10/31/08
15        83 10/15/08
5         93 07/30/09
> 

For getting more knowledge of R, you can watch R tutorial videos on our YouTube channel !

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago