OLS: Omitted variable bias when $\mathbb{E}$(omitted variable) $\neq 0$

213 Views Asked by At

The proof for omitted variable bias is pretty simple, via here:

Assume the true model is $$Y = X\beta + Z\delta + U$$

and we estimate naively $$Y = X\beta + W$$

The OLS estimate of the incorrectly specified model is then $$\beta = [X^TX]^{-1}X^TY = [X^TX]^{-1}X^T[X\beta + Z\delta + U] = \beta + [X^TX]^{-1}X^T[Z\delta + U]$$

And taking the expectation conditional on $X$ (assuming $U$ is mean zero and independent of $X$): $$\mathbb{E}[\beta\vert X] = \beta + [X^TX]^{-1}\mathbb{E}[X^TZ\vert X]\delta$$

The standard proof then shows that our estimate of $\beta$ will be biased if $X$ and $Z$ are correlated.

But what if the mean of $X$ and $Z$ are both nonzero? For example, assume $Z$ is independent of $X$ and that $\mathbb{E}X\neq 0 \neq \mathbb{E}Z$.

Then we have $$\mathbb{E}[\beta\vert X] = \beta + [X^TX]^{-1}\mathbb{E}[X^TZ\vert X]\delta = \beta + [X^TX]^{-1}X^T\mathbb{E}[Z]\delta \neq \beta $$

Doesn't that mean, then, that our estimate of $\beta$ can be biased even if the omitted variable is independent of the other regressors if $\mathbb{E}Z>0$

1

There are 1 best solutions below

9
On

It is true, as long as you don't include the intercept in the regression (a column of 1's in the $X$ matrix).

If you do include an intercept, then

$Y=X\beta+\delta Z+U$

Let's assume that the first column of $X$ is a column of 1's then

$Y=1_n\beta_1+X_{-1}\beta_{-1}+\delta Z+U$

Where $X_{-1}$ represents the $X$ matrix without the first column (define $\beta_{-1}$ analogously)

It can be proven that this can be rewritten as

$Y=1_n\bar Y+X_{-1}^M\beta_{-1}+\delta Z^M+U$

Where $Z^M=Z-\bar Z$ and $X^M$ is defined analogously (by centering each variable).

Now, it follows that the estimator of $\beta_{-1}$ is simply $(X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}Y$ because each column of $X_{-1}^M$ is orthogonal to $1_n$.

Now, because of the orthogonality it also follows that $\hat\beta_{-1}$ is unbiased since

$E((X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}Y)=E((X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}(1_n\bar Y+X_{-1}^M\beta_{-1}+\delta Z^M+U))=\hat\beta_{-1}$