Why is $E(u)=0$ when an intercept is included in OLS Estimation?

5.8k Views Asked by At

I am reading Wooldridge's graduate econometrics text. There he states that when estimating the equation $y=\mathbf{x\beta}+u$ by OLS, if an intercept (constant term) is included in your $\mathbf{x}$ vector, so that $\mathbf{x}=(1,x_1,...,x_k)$, where $y$ and the other $x_1,...,x_k$ are random variables, then we have automatically that $E(u)=0$. I am trying to see why.

Later on, when the textbook introduces OLS, one of the assumptions is that $E(\mathbf{x}^{\top}u)=\mathbf{0}$. Now note that if $\mathbf{x}$ contains an intercept, then this statement implies that $E(u)=0$. However I believe this assumption is not needed to know that when an intercept is included, then $E(u)=0$, rather we have $E(u)=0$ "for free" as the textbook says.

I found a similar question (with name not representative of its question/answer) located here: Why the expected value of the error when doing regression by OLS is 0? which says the constant effectively "absorbs" $E(u)$ to make $E(u)=0$. How does this work, in theory?

My questions:

  1. How do we know that an intercept "absorbs" $E(u)$ exactly -- no more, no less?

  2. Does this rely on the assumption $E(\mathbf{x}^{\top}u)=\mathbf{0}$?

Thanks!


Edit My econometrics instructor notes that if we include an intercept, so that \begin{equation}y=\beta_0+\mathbf{x\beta}+u,\end{equation} where $E(u)=\alpha\ne0$, then we can always rewrite the first equation as \begin{align*} y&=(\alpha+\beta_0)+\mathbf{x\beta}+(u-\alpha) \\ &=(\alpha+\beta_0)+\mathbf{x\beta}+\tilde{u}, \end{align*} where $\tilde{u}=u-\alpha$ and $E(\tilde{u})=0$.

However this still doesn't explain how we get $E(u)$, so that we can "include" it in our intercept. (We can only estimate $E(u)$, but how do we do that?)

I know that if $x_0=1$ and all the other $x_1,...,x_k$ equal $0$ then what really happens is $\hat{\beta}=(x^{\top}x)^{-1}x^{\top}y=y$... Am I getting there?

1

There are 1 best solutions below

3
On BEST ANSWER

We don't know the expected value of the error term -and we do not argue that it equals the estimated value of the constant term, since the constant term may also estimate a "shift factor" anticipated by theory.

Assume that no such shift factor is postulated, then the theory would tell you to specify

$$y=\mathbf{x'\beta}+u$$

with no constant term. Hmmm, are you sure that the error term has a zero mean? Isn't it possible that the model you are implementing has somehow "got it wrong", and apart from the regressors included in $\mathbf x$, there are also other variables that do influence $y$ in the real world? If this is the case then you should expect $E(u) \neq 0$. Then write $u \equiv u-E(u) + E(u)$ and so you are looking at

$$y=\mathbf{x'\beta} + u-E(u) + E(u) \Rightarrow y=E(u) + \mathbf{x'\beta} + (u-E(u)) $$

$E(u)$ is unknown of course -but it is a constant. So setting $E(u) \equiv \beta_0$, you obtain

$$y=\beta_0 + \mathbf{x'\beta} + (u-E(u)) $$

But you can estimate $\beta_0$ through OLS, while the error term that you "ignore" is here $(u-E(u))$ and so has expected value zero.

The "intercept" is not the known regressor comprised of the series of ones. The intercept is the unknown coefficient attached to this regressor. But since this regressor is a series of ones, we do not write it explicitly in the specification, which should be

$$y_i = 1\cdot \beta_0 + \mathbf x_i' \beta + u_i$$ and in vector matrix notation

$$\mathbf y = \mathbf 1 \beta_0 + \mathbf X\beta + \mathbf u$$

where $\mathbf 1 = (1,1,1,...,1)'$