The Linear Regression model is computed well only with uncorrelated variables

46 Views Asked by At

the Linear Regression model is computed well only with uncorrelated variables. When we have highly correlated variable and they are directly expressed through each other our coefficients became not very reliable. But why is it so ?

1

There are 1 best solutions below

0
On

Assume that your models is $$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \epsilon, $$ where the variance of each of the estimated coefficients can be expressed as a reciprocal function of its colinearity with other variables, i.e., WLOG, let us look at $x_1$ and its coefficient. You can show that, $$ \operatorname{var}(\hat{\beta}_1) = \frac{s_Y^2}{(n-1)s_{X_1}^2}\frac{1}{1 - R_1^2}, $$ where $R_1^2$ is the "coefficient of determination" from the following model $$ x_1=b_0+b_1x_2+b_2x_3+\xi. $$ So, you can easily see the the larger the colinearity of $x_1$ with other explanatory variables, then more unstable it will be, i.e., there will be less chance that you will deduce that $x_1$ "effects" $y$. Moreover, as all the $x$s play role in the calculated variance of each of the coefficients, thus all their variance will be "inflated". Where $$ \frac{1}{1-R^2_j}, $$ is called the variance inflation coefficient of the $x_j$ variable. You can see that if the correlation is perfect, i.e., $x_j$ is linear combination of the other explanatory variables, its variance will "explode" (that is design matrix $X'X$ will not be invertible at all as you cannot "stabilize" the coefficient on one point, i.e., there is no global maximum point).