How do I show that the sum of residuals of OLS are always zero using matrices

4.8k Views Asked by At

I am trying to show that $$\sum_{i=1}^ne_i = 0$$

using matrices (or vectors).

I have two hints, so to speak:

$$ HX = X$$ where $H$ is the hat matrix, and that $$\sum_{i=1}^ne_i = e'1$$

My previous solution, In OLS is the vector of residuals always 0?, is wrong since I expanded $Y =X\beta$, leaving out the error term. If I include it I end up with a tautology.

1

There are 1 best solutions below

3
On BEST ANSWER

$$ \operatorname{E} Y = X\beta $$ where

  • $X$ is an $n\times k$ matrix, typically with $n\gg k$, and one of the columns of $X$ is a column of $1$s. (If no column of $X$ is a column in which all entries are equal, then the proposition to be proved is not true.)
  • $\beta$ is a $k\times 1$ column vector.

The hat matrix $H$ is the matrix of the orthogonal projection onto the column space of $X$.

The vector of fitted values is $HY$. A residual is an observed value minus a fitted value (both with the same index $i\in\{1,\ldots,n\}$). Hence the vector of residuals is $e=(I-H)Y$.

Notice that

$$ \mathbf 1'H =\underbrace{(H'\mathbf 1)' = (H\mathbf 1)'}_\text{since $H$ is symmetric} = \mathbf 1' \tag 1 $$ because $\mathbf 1$ is in the column space of $X$.

So $$ \sum_{i=1}^n e_i = \mathbf 1' e = \mathbf 1' (I-H)Y = (\mathbf 1' I - \mathbf 1' H) Y = (\mathbf 1'-\mathbf 1')Y. $$

In the simple case where the model says $y_i=\beta_1 + \beta_2 x_i + \varepsilon_i$ the model can be written as $$ \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{bmatrix} $$ or $$ Y = X\beta+\varepsilon $$ and so the same sort of argument above that shows that the sum of the residuals is $0$, i.e. that $$ \sum_{i=1}^n e_i =0 $$ (note that $e_i$ is not the same thing as $\varepsilon_i$) also shows that $$ \sum_{i=1}^n e_i x_i = 0. $$ That is why there are $n-2$ degrees of freedom for error.

The vector $\varepsilon$ of errors is often taken to be a vector of uncorrelated (if not independent) random variables with expectation $0$ are equal variances (if not identically distributed). The vector $e = \hat\varepsilon = H\varepsilon$, on the other hand, is the vector of residuals, as opposed to errors, and they cannot be uncorrelated because they satisfy the two linear constraints explained above, i.e. those two sums must be $0$. Nor do they all have the same variance. Their matrix of covariances is $\sigma^2(I-H)$ where $\sigma^2=\operatorname{var}(\varepsilon_i)$.