We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !
When a correlation, usually Pearson type correlation, is calculated, two variables have to be continuous. But this requirement does not excludes the situation when one of the two variables is a dichotomous (binary) distributed. Say if we want to measure the correlations between height and gender for a group of people, the variable gender has clear dichotomous values. This kind of Pearson correlation is called point-biserial correlation, because the value for gender variable is strictly 0 or 1. In other cases, if the categorical variable has somewhat underlying continuous characteristics, like passing exam or not passing exam, the corresponding Pearson correlation calculated is called biserial correlation. The difference here is one person very close to the passing limit is quite different from another person far from this limit border.
In R programming , function cor.test() and polyserial() can be used to perform calculation of point-biserial and biserial correlations, respectively.
In the following example, the variable VS from dataset mtcars is binary discreted, and variable mpg variable is a continuous variable. The point-biserial and biserial correlations are implemented with cor.test() and polyserial() functions.
From the cor.test() , result shows there is a quite strong significant relationship between vs and mpg, due to its very low p-value.
#load library
library(polycor)
#using dataset mtcars
> data("mtcars")
#show variables of mtcars
> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
>
>
#point-biserial correlation between vs and mpg
> cor.test(mtcars$vs, mtcars$mpg)
Pearson's product-moment correlation
data: mtcars$vs and mtcars$mpg
t = 4.8644, df = 30, p-value = 3.416e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4103630 0.8223262
sample estimates:
cor
0.6640389
#biserial correlation between vs and mpg
> polyserial(mtcars$vs, mtcars$mpg)
[1] 0.7176074
You can also watch full video on R Basic course from our YouTube channel.
0 Comments