Statistics

Test on an individual coefficient in Multiple Linear Regression model

We provide effective and economically affordable training courses for R, Python and Statistics, click here for more details and course registration !

When a multiple linear regression model is estimated, the next step is usually checking the significance of each coefficient. As the arbitrary inclusion of an unnecessary variable into the model will increase the SSR (Regression sum squares) a little, which is not warrant the reduce of the SSE (Sum of Squared Error). So decision on inclusion or removal of an wanted variable will ensure the avoidance of overfitting of the model. In most of the statistical software package for regression, the significance of the coefficient will be shown in the result. In this post we will show the test of significance more theoretically. First we will list some mathematical notations, which will be important to understand the hypothesis testing for regression analysis. For a multiple regression model,

Multiple Linear Regression

response vector y is simply modeled as the product of independent variable matrix and coefficient vector beta., and epsilon is the error term with mean zero and standard deviation sigma.

To test the significance of, say beta3 in a model with three independent variables X1, X2, X3, we should first formulate Null hypothesis H0 and alternative hypothesis H1:

Null hypothesis and Alternative hypothesis

Then we can calculate the t-test statistic

t-test statistic

where b3 is the point estimate of coefficient beta3, and 0 is the Null hypothesis value, s is the square root of s-square, which is mean of SSE, and C33 is an element in the inverse matrix of the product of transpose of X and X, which are shown in the ANOVA table below:

inverse of X’ times X
ANOVA table for Regression

t-test statistic calculated above follows a t-Distribution with 9 degrees of freedom. And the value is simply not significant, because it lies outside of the critical value (roughly speaking, a significant t-test statistic should be either less than -2 or larger than 2). So it means that the coefficient beta 3 is not significantly different from zero, thus variable X3 should not be included in the model.

For a t-test, it usually has a F-test counterpart. In linear regression, we can apply the idea that testing the marginal addition of SSR is large enough compared with the SSE. We run model two times, one with the original model with inclusion of three variables X1, X2, X3, and the other model is run with only X1 and X3.

Increase of SSR due to X3

So the increase of SSR due to X3 is the SSR difference of these two models. Next we can calculate a F-test statistic, which is the division of SSR due to X3 and S-square. This statistic follow F distribution of 1 and 9 degrees of freedom. It is not significant because the p-value is much larger than 0.05.

F test statitic

Actually the F test statistic is square of the t-test statistic.

For more understanding of mathematical statistics, you can watch statistics tutorial videos on our YouTube channel.

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago