R Programming

How to generate descriptive statistics in R

We provide effective and economically affordable online training courses for R and Python, click here for more details and course registration !

When we have a data set on hand, the first step of data analysis is usually drawing descriptive statistics. The most common descriptive statistics for numerical variables are mean or average, median, minimum and maximum value. summary() in R provides the list of descriptive statistics mentioned.

#first observations of a data frame
head(grade)
  StudentID        Fullname Race Gender Country Age Math Physics
1         1     James Zhang    A   Male      US  23   73      70
2         2       Wilson Li    E Female      UK  26   95      76
3         3 Richard Nuan Ye    A   Male      UK  35   77      83
4         4       Mary Deng    E Female      US  21   60      99
5         5    Jason Wilson    A   Male      UK  19   77      89
6         6 Jennifer Hopkin    A Female      UK  43   79      64
  Chemistry       Date
1        87 10/31/2008
2        83  3/16/2008
3        92  5/22/2008
4        84  1/24/2009
5        93  7/30/2009
6        83   4/5/2009
#Descriptive statistics with summary()
vars <- c("Math", "Physics", "Chemistry")
#descriptive of Math, Physics, Chemistry
summary(grade[vars])
#output
     Math         Physics        Chemistry    
 Min.   :60.0   Min.   :63.00   Min.   :62.00  
 1st Qu.:72.5   1st Qu.:72.75   1st Qu.:76.75  
 Median :79.0   Median :78.50   Median :83.50  
 Mean   :78.9   Mean   :80.40   Mean   :82.20  
 3rd Qu.:87.0   3rd Qu.:89.25   3rd Qu.:89.75  
 Max.   :95.0   Max.   :99.00   Max.   :96.00  

An alternative is using describe() function from Hmisc package. In addition to these common statistics, more detailed information of value range and counts are listed.

library(Hmisc)
vars <- c("Math", "Physics", "Chemistry")
describe(grade[vars])
#output
grade[vars] 

 3  Variables      20  Observations
---------------------------------------------------------------------
Math 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       14    0.994     78.9    11.72    64.75 
     .10      .25      .50      .75      .90      .95 
   65.90    72.50    79.00    87.00    91.40    95.00 
                                                                 
Value        60   65   66   69   71   73   77   79   82   83   87
Frequency     1    1    1    1    1    2    2    3    1    1    2
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.15 0.05 0.05 0.10
                         
Value        90   91   95
Frequency     1    1    2
Proportion 0.05 0.05 0.10

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Physics 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       16    0.997     80.4    12.75    63.95 
     .10      .25      .50      .75      .90      .95 
   68.50    72.75    78.50    89.25    93.60    99.00 
                                                                 
Value        63   64   69   70   72   73   76   78   79   83   87
Frequency     1    1    1    1    1    2    2    1    1    2    1
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.05 0.05 0.10 0.05
                                   
Value        89   90   92   93   99
Frequency     1    1    1    1    2
Proportion 0.05 0.05 0.05 0.05 0.10

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Chemistry 
       n  missing distinct     Info     Mean      Gmd      .05 
      20        0       14    0.994     82.2    11.34    64.85 
     .10      .25      .50      .75      .90      .95 
   66.80    76.75    83.50    89.75    93.00    93.15 
                                                                 
Value        62   65   67   69   76   77   82   83   84   87   89
Frequency     1    1    1    1    1    1    1    3    2    2    1
Proportion 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.15 0.10 0.10 0.05
                         
Value        92   93   96
Frequency     2    2    1
Proportion 0.10 0.10 0.05

For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------

For getting more knowledge of R and a preview of our training course, you can watch R tutorial videos on our YouTube channel !

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago