R Programming

Aggregating data using aggregate() in R

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

It is very natural to generate summary statistics in data analysis. For example, people may calculate the mean values of costs and income by different years. In R programming, function aggregate() provides an easy way to calculate summary statistics of variables by specific groups in a data frame. The basic form of the function is aggregate(df, by, FUN), Where

df is a data frame, and

by is a list of variables used for grouping, and

FUN is the function applied to the grouping.

In the following code example, we first create a data frame ‘test’ with testing scores of mathematics, physics and chemistry from different students. Then we use aggregate() function to generate mean testing scores by variable ‘Gender’ and ‘Country’.

# to set working directory
setwd("d:\\RStatistics-Tutorial")   

#create a grade data frame
vartype<-c("character", "character", "character", "character", "character", "numeric","numeric", "numeric","numeric","character")

grade <- read.table("University-Fullname-full.csv", colClasses=vartype, header=TRUE, sep=",")                                      

#to create a data frame 'test'
test<-grade[,c(4,5,7:9)]
test$Gender<-as.factor(test$Gender)
test$Country<-as.factor(test$Country)

#to show first observations of data frame 'test'
head(test)
#output
 Gender Country Math Physics Chemistry
1   Male      US   73      70        87
2 Female      UK   95      76        83
3   Male      UK   77      83        92
4 Female      US   60      99        84
5   Male      UK   77      89        93
6 Female      UK   79      64        83

#aggretate data, calculate mean value of all variables
#by Gender and Country
agg <- aggregate(test, by = list(test$Gender, 
     test$Country), FUN = mean, na.rm = TRUE)

#to show result
agg
  Group.1 Group.2 Gender Country  Math  Physics Chemistry
1  Female      UK     NA      NA 88.75 74.25000  81.75000
2    Male      UK     NA      NA 76.50 82.75000  86.25000
3  Female      US     NA      NA 78.00 86.42857  82.85714
4    Male      US     NA      NA 74.20 75.00000  78.40000

We can see that the result contains mean scores of mathematics, physics, and chemistry by different groups with respect to Gender and Country. But the column labels for grouping are assigned by default as ‘Group.1’ and ‘Group.2’. aggregate() function provides the feasibility to customize the labels in the resulting data frame. In the following code example, we will implement this by setting column names in the list() option, and remove two redundant columns ‘Gender’ and ‘Country’ by using the form [-c(1,2)] after data frame.

#a better solution: remove the redundant variable 
#Gender and Country 
#and customize columns names for groups



#using aggregate() in R to generate mean scores
#for Math, Physics, Chemistry by gender
#          www.rdatacode.com
agg <- aggregate(test[-c(1,2)], 
      by = list(Gender=test$Gender, 
      Coungry=test$Country), 
      FUN = mean, na.rm = TRUE)

#to show result
agg

  Gender Coungry  Math  Physics Chemistry
1 Female      UK 88.75 74.25000  81.75000
2   Male      UK 76.50 82.75000  86.25000
3 Female      US 78.00 86.42857  82.85714
4   Male      US 74.20 75.00000  78.40000

You can also watch video on R tutorial from our YouTube channel.

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago