When we have a data set on hand, the first step of data analysis is usually drawing descriptive statistics. The most common descriptive statistics for numerical variables are mean or average, median, minimum and maximum value. summary() in R provides the list of descriptive statistics mentioned.
#first observations of a data frame
head(grade)
StudentID Fullname Race Gender Country Age Math Physics
1 1 James Zhang A Male US 23 73 70
2 2 Wilson Li E Female UK 26 95 76
3 3 Richard Nuan Ye A Male UK 35 77 83
4 4 Mary Deng E Female US 21 60 99
5 5 Jason Wilson A Male UK 19 77 89
6 6 Jennifer Hopkin A Female UK 43 79 64
Chemistry Date
1 87 10/31/2008
2 83 3/16/2008
3 92 5/22/2008
4 84 1/24/2009
5 93 7/30/2009
6 83 4/5/2009
#Descriptive statistics with summary()
vars <- c("Math", "Physics", "Chemistry")
#descriptive of Math, Physics, Chemistry
summary(grade[vars])
#output
Math Physics Chemistry
Min. :60.0 Min. :63.00 Min. :62.00
1st Qu.:72.5 1st Qu.:72.75 1st Qu.:76.75
Median :79.0 Median :78.50 Median :83.50
Mean :78.9 Mean :80.40 Mean :82.20
3rd Qu.:87.0 3rd Qu.:89.25 3rd Qu.:89.75
Max. :95.0 Max. :99.00 Max. :96.00
An alternative is using describe() function from Hmisc package. In addition to these common statistics, more detailed information of value range and counts are listed.
library(Hmisc)
vars <- c("Math", "Physics", "Chemistry")
describe(grade[vars])
#output
grade[vars]
3 Variables 20 Observations
---------------------------------------------------------------------
Math
n missing distinct Info Mean Gmd .05
20 0 14 0.994 78.9 11.72 64.75
.10 .25 .50 .75 .90 .95
65.90 72.50 79.00 87.00 91.40 95.00
Value 60 65 66 69 71 73 77 79 82 83 87
Frequency 1 1 1 1 1 2 2 3 1 1 2
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.15 0.05 0.05 0.10
Value 90 91 95
Frequency 1 1 2
Proportion 0.05 0.05 0.10
For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Physics
n missing distinct Info Mean Gmd .05
20 0 16 0.997 80.4 12.75 63.95
.10 .25 .50 .75 .90 .95
68.50 72.75 78.50 89.25 93.60 99.00
Value 63 64 69 70 72 73 76 78 79 83 87
Frequency 1 1 1 1 1 2 2 1 1 2 1
Proportion 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.05 0.05 0.10 0.05
Value 89 90 92 93 99
Frequency 1 1 1 1 2
Proportion 0.05 0.05 0.05 0.05 0.10
For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
Chemistry
n missing distinct Info Mean Gmd .05
20 0 14 0.994 82.2 11.34 64.85
.10 .25 .50 .75 .90 .95
66.80 76.75 83.50 89.75 93.00 93.15
Value 62 65 67 69 76 77 82 83 84 87 89
Frequency 1 1 1 1 1 1 1 3 2 2 1
Proportion 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.15 0.10 0.10 0.05
Value 92 93 96
Frequency 2 2 1
Proportion 0.10 0.10 0.05
For the frequency table, variable is rounded to the nearest 0
---------------------------------------------------------------------
0 Comments