R Programming

Calculate point-biserial and biserial correlations using R

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

When a correlation, usually Pearson type correlation, is calculated, two variables have to be continuous. But this requirement does not excludes the situation when one of the two variables is a dichotomous (binary) distributed. Say if we want to measure the correlations between height and gender for a group of people, the variable gender has clear dichotomous values. This kind of Pearson correlation is called point-biserial correlation, because the value for gender variable is strictly 0 or 1. In other cases, if the categorical variable has somewhat underlying continuous characteristics, like passing exam or not passing exam, the corresponding Pearson correlation calculated is called biserial correlation. The difference here is one person very close to the passing limit is quite different from another person far from this limit border.

In R programming , function cor.test() and polyserial() can be used to perform calculation of point-biserial and biserial correlations, respectively.

In the following example, the variable VS from dataset mtcars is binary discreted, and variable mpg variable is a continuous variable. The point-biserial and biserial correlations are implemented with cor.test() and polyserial() functions.

From the cor.test() , result shows there is a quite strong significant relationship between vs and mpg, due to its very low p-value.

#load library
library(polycor)

#using dataset mtcars
> data("mtcars")

#show variables of mtcars
> str(mtcars)
'data.frame': 32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
> 
> 
#point-biserial correlation between vs and mpg
> cor.test(mtcars$vs, mtcars$mpg)

 Pearson's product-moment correlation

data:  mtcars$vs and mtcars$mpg
t = 4.8644, df = 30, p-value = 3.416e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4103630 0.8223262
sample estimates:
      cor 
0.6640389 

#biserial correlation between vs and mpg
> polyserial(mtcars$vs, mtcars$mpg)
[1] 0.7176074

You can also watch full video on R Basic course from our YouTube channel.

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago