Proof of Gauss-Markov theorem

15.3k Views Asked by At

Theorem: Let $Y=X\beta+\varepsilon$ where $$Y\in\mathcal M_{n\times 1}(\mathbb R),$$ $$X\in \mathcal M_{n\times p}(\mathbb R),$$ $$\beta\in\mathcal M_{n\times 1}(\mathbb R ),$$ and $$\varepsilon\in\mathcal M_{n\times 1}(\mathbb R ).$$

We suppose that $X$ has full rank $p$ and that $$\mathbb E[\varepsilon]=0\quad\text{and}\quad \text{Var}(\varepsilon)=\sigma ^2I.$$ Then, the least square estimator (i.e. $\hat\beta=(X^TX)^{-1}X^Ty$) is the best unbiased estimator of $\beta$, that is for any linear unbiased estimator $\tilde\beta$ of $\beta$, it hold that $$\text{Var}(\tilde\beta)-\text{Var}(\hat\beta)\geq 0.$$

Proof

Let $\tilde\beta$ a linear unbiased estimator, i.e. $$\tilde\beta=AY\ \ \text{for some }A_{n\times p}\quad\text{and}\quad\mathbb E[\tilde\beta]=\beta\text{ for all }\beta\in\mathbb R ^p.$$

Questions :

1) Why $\mathbb E[\tilde\beta]=\beta$ for all $\beta$, I don't really understand this point. To me $\beta$ is fixed, so $\mathbb E[\tilde\beta]=\beta$ for all $\beta$ doesn't have really sense.

2) Actually, what is the difference between the least square estimator and the maximum likelihood estimator. They both are $\hat\beta=(X^TX)^{-1}X^Ty$, so I don't really see (if they are the same), why we give two different name.

3

There are 3 best solutions below

2
On

1) The condition $\mathbb{E}[\tilde{\beta}]=\beta$ is just the condition "the estimator is unbiased" in mathematical form. Let's say you are considering the least squares estimator, then $$ \begin{align} \mathbb{E}[\hat{\beta}] &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}Y]\\ &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}X\beta+\epsilon]\\ &= \beta, \end{align} $$ and thus the least squares estimator is unbiased. You do have to assume that the noise is zero mean by the way. So not every estimator of the form $\tilde{\beta}=AY+D$ is unbiased.

2) Maximum likelihood and least squares are equivalent under certain conditions, that is if you assume the noise $\epsilon$ is Gaussian. Change that and they won't be the same.

1
On

The Gauss-Markov theorem states that, under the usual assumptions, the OLS estimator $\beta_{OLS}$ is BLUE (Best Linear Unbiased Estimator). To prove this, take an arbitrary linear, unbiased estimator $\bar{\beta}$ of $\beta$. Since it is linear, we can write $\bar{\beta} = Cy$ in the model $y = \beta X + \varepsilon$. Furthermore, it is necessarily unbiased, $\mathbb{E} [ \bar{\beta} ] =C\mathbb{E}[y] = CX\beta= \beta$, which only holds when $CX=I$, with $I$ the identity matrix.

Then: \begin{align*} \operatorname{Var}[\bar{\beta}] &= \operatorname{Var}[Cy] \\ &= C \operatorname{Var}[y]C'\\ &= \sigma^2 CC' \\ &\geq \sigma^2 CP_XC' \\ &= \sigma^2 CX(X'X)^{-1}X'C' \\ &= \sigma^2 (X'X)^{-1} \\ &= \operatorname{Var}[\beta_{OLS}] \end{align*} Where $P_X$ is the projection matrix, $P_X = X(X'X)^{-1}X'$.

0
On

The Gauss-Markov Theorem is actually telling us that in a regression model, where the expected value of our error terms is zero, $E(\epsilon_{i}) = 0$ and variance of the error terms is constant and finite $\sigma^{2}(\epsilon_{i}) = \sigma^{2} < \infty$ and $\epsilon_{i}$ and $\epsilon_{j}$ are uncorrelated for all i and j the least squares estimator $b_{0}$ and $b_{1}$ are unbiased and have minimum variance among all unbiased linear estimators. Note that there might be biased estimator which have a even lower variance.

Extensive information about the Gauss-Markov Theorem, such as the mathematical proof of the Gauss-Markov Theorem can be found here http://economictheoryblog.com/2015/02/26/markov_theorem/

However, if you want to know which assumption is necessary for $b1$ to be an unbiased estimator for $\beta1$, I guess that assumption 1 to 4 of the following post (http://economictheoryblog.com/2015/04/01/ols_assumptions/) must be fulfilled to have an unbiased estimator.

Furthermore, it is true that the maximum likelihood estimator and least squares estimator are equivalent under certain conditions, i.e if noise $\epsilon$ is Gaussian distributed.

Hope this helps.

HTH