Which "linear estimators" are actually permitted in the Gauss Markov Theorem?

96 Views Asked by At

Let $y$ be an $n\times 1$ vector and $X$ be a $n\times k$ matrix. Suppose that (i) $y=X\beta+\varepsilon$ where $E[\varepsilon|X]=0$, (ii) $X'X$ has rank $k$, and (iii) $E[\varepsilon\varepsilon'|X]=\sigma^{2}(X'X)^{-1}$.

The famous Gauss-Markov (GM) Theorem says that, under the assumptions above, the OLS estimator $\hat{\beta} = (X'X)^{-1}X'y$ is the minimum variance linear unbiased estimator of $\beta$ in the linear regression equation $y = X\beta +\varepsilon$.

Check out the wikipedia proof here (note that wikipedia assumes $X$ is non-stochastic).

Looking through a variety of textbooks, when the Theorem says that $\hat{\beta}$ is the minimum variance linear unbiased estimator, it seems it means it is the minimum variance unbiased estimator out of all estimators that can be written as $Cy$ for some $k\times n$ matrix $C$.

However, every proof I have seen seems to implicitly assume that $E[C|X]=C$.

To plead my case, take the wikipedia example (here).

Wikipedia assumes that $X$ is non-stochastic, but I will accomplish this by conditioning on $X$.

If $\hat{\gamma}=Cy$, then defining $D=C-(X'X)^{-1}X'$ we have: $$\hat{\gamma}=(X'X)^{-1}X'y + Dy.$$ Then the first few lines of the proof on Wikipedia can be written as (conditioning on $X$ here):

\begin{aligned}E[\hat{\gamma}|X]&=E[Cy|X]\\ &=E\left[\left((X'X)^{-1}X'+D\right)(X\beta +\varepsilon )|X\right]\\ &=\left((X'X)^{-1}X'+D\right)X\beta +\left((X'X)^{-1}X'+D\right)\operatorname {E} [\varepsilon |X ].\end{aligned} However, the last line only works if $E[D\varepsilon|X]=DE[\varepsilon|X]$. Where was this assumed in the Gauss-Markov Theorem?

Again, the situation is not unique to the Wikipedia proof, but is present in some form or another in every proof I have seen.

I am looking for clarification on what estimators we are actually comparing the OLS estimator to in the Gauss Markov Theorem.