Multiple linear regression and the betas

205 Views Asked by At

So I have a question about multiple linear regression.

$$Y^j=\beta_1 X_1^j+\beta_2 X_2^j +\cdots +\beta_p X_p^j + \epsilon \tag{*}$$

When I test the significance of the $\beta$ with student or Fisher statistic using for instance EXCEL or R some of the $\beta$ are non significant meaning according to the test some of the betas are equal to zero and some not.

My question is easy: do i have to take out the betas who are equal to zero in the model(*) and i have a new model( or is it the same ??) without the betas equal to zeros ?

thanks in advance i hope everyone understand me !).

1

There are 1 best solutions below

0
On

That is a model selection question and in general there are many approaches. Asking on https://stats.stackexchange.com/ will probably give you multiple detailed answers.

When you do a t-test or an F-test for a single variable, you test whether the coefficient is zero "in the presence of all the other variables". Dropping all the insignificant ones at once changes the model,and there is no clear-cut answer. You can do an F-test for the significance of the resulting model after the reduction (i.e., test if all remaining non-intercept coefficients are zero), and keep the model if you do not reject the null.

There are many other (and better) approaches for model selection, i.e., picking which covariates to include in the model. You can for example, use all-subset regression (\ell_0 penalty) together with any of these measures: AIC, BIC, Mallow's $C_p$, adjusted $R^2$, prediction error sum-of-squares (PRESS). You can also use the Lasso or concave penalties (like the MCP) to avoid going over all subsets.

You can also use step-wise (forward and/or backward) regression which basically uses the t or F test in a sequential manner to include the most significant variables one at a time (or to eliminate the least significant one sequentially.)

These are not the only approaches!