Interpreting OLS Regression Coefficients with High Multicolinearity

178 Views Asked by At

I am having trouble understanding the interpretation of OLS coefficients when predictors are highly correlated. My understanding of OLS coefficients is that they estimate a change in the expected outcome following a 1 unit increase in the predictor, holding all other predictors constant. However, I cannot understand why this doesn't lead to underestimates following an increase in both positively correlated predictors.

To give a concrete example.

Suppose income = constant + 0.9*innate intelligence + 0.05*skills acquired through education + random variation

but we can only observe performance in an iq test and years at school,

where years of education = 0.1* innate intelligence + random variation

and performance in iq test = innate intelligence + random variation

Would OLS underestimate the expected salary of someone who has both a high score in an iq test and many years of education, since the predictive power of either coefficient holding the other constant would be small?

2

There are 2 best solutions below

0
On BEST ANSWER

I think part of the confusion here is that you are thinking the parameters in the multivariate regression will be the same as the parameters in either univariate regression. This is not the case when the predictors are correlated. In other words, it might be that $E[Y|X_1]=\beta_1X_1$ and $E[Y|X_2]=\beta_2X_2$. However, that doesn't mean in general that $E[Y|X_1,X_2]=\beta_1X_1+\beta_2X_2$

A good way to gain some intuition about multicollinearity is to consider the extreme case where you have two predictors $X_1$ and $X_2$ such that $X_1=X_2$. Suppose that $Y=\beta X_1$ (note no error term). Now suppose you don't the two predictors are the same and attempt to fit a model like $$Y=\beta_1X_1+\beta_2X_2+\epsilon.$$ What you will find is that there is not a unique solution. You could have $(\hat{\beta_1},\hat{\beta_2})=(\beta,0)$ or $(\hat{\beta_1},\hat{\beta_2})=(0,\beta)$, or any mixture of the form $\hat{\beta_1}=\alpha\beta, \hat{\beta_2}=(1-\alpha)\beta$. However, what you couldn't have as a valid solution is $(\hat{\beta_1},\hat{\beta_2})=(\beta,\beta)$, even though this is what you would get if you estimate the two parameters separately using two different univariate regression models.

A similar thing happens in more realistic cases where you have correlated predictors and random error. The weight will end up being split between the two predictors. The OLS estimates become very sensitive to small changes in the data, and can end up putting too much weight on one or ther other predictors, resulting in the high variances majmun mentioned.

So to answer your question: no, it will not produce an overestimate.

2
On

I can not answer your question too well, particularly that last point, but I will state what the Econometrics textbook I have at hand says (I don't have enough reputation to simply comment).

Like you say, we interpret OLS coefficients by the impact of a one unit increase holding all other independent variables fixed. I'm not completely sure, but... I believe this would still be the interpretation in the presence of multicollinearity. However, as according to the textbook, "if two explanatory variables are significantly related, then the OLS computer program will find it difficult to distinguish the effects of one variable from the effects of the other" (Studenmund, Using Econometrica).

I take this to mean that the coefficients will not be accurate; that is, we interpret them the same, but we cannot trust the coefficients to be correct (and therefore our interperations will be incorrect).

For your second question, "Would OLS underestimate..." I don't know the exact answer to this. However, I believe that it may underestimate, but it may also overestimate. I don't see why it would simply underestimate? The predictive power of the coefficients would not be small, it is just that the predictive power of one may be attributed to the other. Indeed, when there is multicollinearity The variances will increase (and thus the standard errors). To truly answer your question I would need to be more well versed in the mathematics behind OLS than I currently am.

The link here at stats stackexchange may be useful to you. You may also want to try asking this question there, as I do not know how well received it will be here. One last thing to note, that perhaps is not said at the link is that multicollinearity does not affect the overall fit of the equation will not change very much.