Correct model specification v. Orthogonality condition in Inference

383 Views Asked by At

I am fleshing out the concepts on the assumptions made by classical linear regression model in finite sample, then making a leap to the large sample theory.

But I don't quite understand what kind of important distinctions I should make between exogeneity v. orthogonality, and also strict v. weak exogeneity.

Weak exogeneity, $E(\epsilon_i|X_i)=0$, means that contemporaneously no information of X gives any predictive power over the disturbance term. So my model, if this holds, would mean that I have not left out any relevant systematic influences as regressors, so my model is correctly specified.

Then, what does the strict exogeneity means? If we have strict exogeneity, $E(\epsilon|X)=0$. So I suppose in time series case, this is most useful because not only do we want the model to be correctly specified but across time we want to make sure no systematic influences have impact on disturbances in a different time observation? For example, my regressand is how much I spend today that depends on how much I spend on yesterday (AR1), then the conditional mean of error term today given consumption yesterday won't be zero? But contemporaneously it would be?

My other big question is about the orthogonality. If your model is correctly specified (i.e. weak exog), then I understand this implies orthogonality, $E(X\epsilon)=0$. This makes sense to me, because if you truly included all relevant systematic influences on regressand so that the error term doesn't have any omitted variable, then $X$ and $\epsilon$ are uncorrelated. But how does the correlation between $X$ and $\epsilon$ differ from the so-called exogeneity condition? I guess I don't quite understand how I should gain intuition in difference between $E(\epsilon_i|X_i)$ v. $E(X\epsilon)$.

Finally, the way I understand the term "correct model specification" is that the true relationship is also linear in regressors. When I construct a linear regression model on $Y$, the true relationship also looks like $Y=\beta_0+\beta_1X_1+\epsilon$ and that I did not leave any variables omitted., Is this correct?

Appreciate your help!

1

There are 1 best solutions below

2
On BEST ANSWER

1) Regression problem in the population stems\starts from the following decomposition $$ Y = E[Y|X] + \epsilon. $$ From this orthogonal decomposition you get the two properties $$ E[\epsilon |X ] = E[Y - E[Y|X] | X]=0, $$ and $$ E[X\epsilon] = E[E[X\epsilon|X]]= E[XE[\epsilon|X]]= 0. $$ 2) The specification issue is whether the structure that you imply on $E[Y|X]$ is indeed the "true" one. I.e., if the model that generated the data $\{(x_i, y_i)\}_{i=1}^n$ is bivariate normal , then $E[Y|X=x] = \beta_0 + \beta_1x$ is the right specification. However, if $E[Y|X]=\beta x^2$ and you have assumed that $E[Y|X=x] = \beta_0 + \beta_1x$ or $E[Y|X=x] = \beta x$, then you have a problem of misspecification, namely your estimators of $\beta$ will not converge to the real values (you'll have a systematic error).