We provide effective and economically affordable online training courses for R and Python, click here for more details and course registration !

When we have a data set on hand, the first step of data analysis is usually drawing descriptive statistics. The most common descriptive statistics for numerical variables are mean or average, median, minimum and maximum value. summary() in R provides the list of descriptive statistics mentioned.

#first observations of a data frame
head(grade)
  StudentID        Fullname Race Gender Country Age Math Physics
1         1     James Zhang    A   Male      US  23   73      70
2         2       Wilson Li    E Female      UK  26   95      76
3         3 Richard Nuan Ye    A   Male      UK  35   77      83
4         4       Mary Deng    E Female      US  21   60      99
5         5    Jason Wilson    A   Male      UK  19   77      89
6         6 Jennifer Hopkin    A Female      UK  43   79      64
  Chemistry       Date
1        87 10/31/2008
2        83  3/16/2008
3        92  5/22/2008
4        84  1/24/2009
5        93  7/30/2009
6        83   4/5/2009
#Descriptive statistics with summary()
vars <- c("Math", "Physics", "Chemistry")
#descriptive of Math, Physics, Chemistry
summary(grade[vars])
#output
     Math         Physics        Chemistry    
 Min.   :60.0   Min.   :63.00   Min.   :62.00  
 1st Qu.:72.5   1st Qu.:72.75   1st Qu.:76.75  
 Median :79.0   Median :78.50   Median :83.50  
 Mean   :78.9   Mean   :80.40   Mean   :82.20  
 3rd Qu.:87.0   3rd Qu.:89.25   3rd Qu.:89.75  
 Max.   :95.0   Max.   :99.00   Max.   :96.00  

An alternative is using describe() function from Hmisc package. In addition to these common statistics, more detailed information of value range and counts are listed.

library(Hmisc)
vars <- c("Math", "Physics", "Chemistry")
describe(grade[vars])
#output
grade[vars] 

 3  Variables      20  Observations
---------------------------------------------------------------------
Math 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       14    0.994     78.9    11.72    64.75 
     .10      .25      .50      .75      .90      .95 
   65.90    72.50    79.00    87.00    91.40    95.00 
                                                                 
Value        60   65   66   69   71   73   77   79   82   83   87
Frequency     1    1    1    1    1    2    2    3    1    1    2
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.15 0.05 0.05 0.10
                         
Value        90   91   95
Frequency     1    1    2
Proportion 0.05 0.05 0.10

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Physics 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       16    0.997     80.4    12.75    63.95 
     .10      .25      .50      .75      .90      .95 
   68.50    72.75    78.50    89.25    93.60    99.00 
                                                                 
Value        63   64   69   70   72   73   76   78   79   83   87
Frequency     1    1    1    1    1    2    2    1    1    2    1
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.05 0.05 0.10 0.05
                                   
Value        89   90   92   93   99
Frequency     1    1    1    1    2
Proportion 0.05 0.05 0.05 0.05 0.10

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Chemistry 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       14    0.994     82.2    11.34    64.85 
     .10      .25      .50      .75      .90      .95 
   66.80    76.75    83.50    89.75    93.00    93.15 
                                                                 
Value        62   65   67   69   76   77   82   83   84   87   89
Frequency     1    1    1    1    1    1    1    3    2    2    1
Proportion 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.15 0.10 0.10 0.05
                         
Value        92   93   96
Frequency     2    2    1
Proportion 0.10 0.10 0.05

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------

For getting more knowledge of R and a preview of our training course, you can watch R tutorial videos on our YouTube channel !