R Programming

Estimating simple linear regression model using lm() in R

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Linear regression is widely used to model the relationship between response or dependent variable and explanatory or independent variables. The parameter in the model has linear form. When there is only one explanatory assumed in the model, it is called simple linear regression.

Simple linear regression model

In R programming, lm() function maybe the most widely used function to estimate a linear regression model. The basic form of the function is :

m_fit <- lm(formula, df)

Where

df is the data frame where variables come from,

formula shows the response and explanatory variables, using plus symbol to separate explanatory variables. The following code shows a simple linear regression estimation between response variable ‘weight’ and explanatory variable ‘height’ from data set ‘women’ from R base installation. The result of lm() function is assigned to a list object m_fit.

m_fit <- lm(weight ~ height, data=women)

After the model is estimated, we can show the estimation result using summary() function.

> summary(m_fit)

#result
Call:
lm(formula = weight ~ height, data = women)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7333 -1.1333 -0.3833  0.7417  3.1167 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
height        3.45000    0.09114   37.85 1.09e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.525 on 13 degrees of freedom
Multiple R-squared:  0.991, Adjusted R-squared:  0.9903 
F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

The information in summary() function includes the coefficient estimates, residuals, and R-squared value. Residuals are the difference between observed response variable and fitted value from the linear model. In the coefficients part, in addition to the point estimate of two coefficients (intercept – beta0, height – beta1), the standard errors as well as the corresponding t-statistics and p-values are also listed. Both coefficients are significant because the p-values are fairly close to zero.

In the last part of the result in summary() function, there are R-squared value. The two values are both close to 0.99, which state that almost 99% of the variation in the response variable is explained by the explanatory variable. In a simple linear regression model, this value also equals to the square of correlation coefficient between these two variables. Alternatively, the result shows F-statistics and corresponding p-value. An almost zero p-value also says the model has significant power to explain the response variable numerically by using the explanatory variables.

You can also watch the video on R programming training from our YouTube channel here.

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago