That's my question, I have looking round online and people post a formula by they don't explain the formula. Could anyone please give me a hand with that ? cheers
Why the sum of residuals equals 0 when we do a sample regression by OLS?
145.3k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 4 best solutions below
On
take the estimated values from the line of best fit and use these y values to subtract from the original y values then add them up. if it is a good line of best fit then it should approach zero, but bad lines of best fit will be much less or more than zero
On
The accepted solution by Alecos Papadopoulos has a mistake at the end. I can't comment so I will have to submit this correction as a solution, sorry.
It's true that a series of ones would do the job. But it's not true that we need it. We do not need the regressor to have a series of ones in order for $Mi = 0$.
Theorem: If $\exists$ a $p$ x $1$ vector $v$ such that: $$Xv = 1_n$$
where $1_n$ is a $n$ x $1$ vector of ones, then $$\sum_{i=1}^ne_i=0$$ Proof: $\sum_{i=1}^ne_i= e^T1_n =e^T X v = (e^T X) v=(X^Te)^T v = (0)^T v = 0 $
Above I am using the fact that $X^Te=0$. Having a series of ones in X (a.k.a. intercept) is just a special case of $v$. If the intercept is in the first column $v$ would look like this $[1,0,0,0,0,0...]$
On
I want to provide a more general answer from the statistical sense of the word "residual". I figured this out in my quest to understand degrees of freedom and Bessels's correction in statistics.
A residual in statistics means the difference between a variable's value and the sample mean (not the true, usually unknowable average).
So if $x_i, i \in \{1, ..., N\}$ represents a sample value:
$$ \sum_i r_i = \sum_i x_i - \mu \\ = \sum_i (x_i - \frac{1}{N}\sum_i x_i) \\ = \sum_i x_i - N \frac{1}{N}\sum_i x_i \\ = 0 $$
Which is the basis behind the argument behind Bessel's Correction, which is the practice of dividing the sum of squared residuals by $N-1$ rather than $N$.
$$ \sigma^2 = \frac{1}{N - 1}\sum(x_i - \mu)^2$$
The idea is (according to Wikipedia) that the residuals are not independent because they sum to zero, therefore you subtract one. I do not actually understand this statement (how do we know that the span of the residuals is $N-1$ and not $N-2$ for example?). However, this nice proof explains the intuition behind the correction from a functional/applied point of view.
If the OLS regression contains a constant term, i.e. if in the regressor matrix there is a regressor of a series of ones, then the sum of residuals is exactly equal to zero, as a matter of algebra.
For the simple regression,
specify the regression model $$y_i = a +bx_i + u_i\,,\; i=1,...,n$$
Then the OLS estimator $(\hat a, \hat b)$ minimizes the sum of squared residuals, i.e.
$$(\hat a, \hat b) : \sum_{i=1}^n(y_i - \hat a - \hat bx_i)^2 = \min$$
For the OLS estimator to be the argmin of the objective function, it must be the case as a necessary condition, that the first partial derivatives with respect to $a$ and $b$, evaluated at $(\hat a, \hat b)$ equal zero. For our result, we need only consider the partial w.r.t. $a$:
$$\frac {\partial}{\partial a} \sum_{i=1}^n(y_i - a - bx_i)^2 \Big |_{(\hat a, \hat b)} = 0 \Rightarrow -2\sum_{i=1}^n(y_i - \hat a - \hat bx_i) = 0 $$
But $y_i - \hat a - \hat bx_i = \hat u_i$, i.e. is equal to the residual, so we have that
$$\sum_{i=1}^n(y_i - \hat a - \hat bx_i) = \sum_{i=1}^n\hat u_i = 0 $$
The above also implies that if the regression specification does not include a constant term, then the sum of residuals will not, in general, be zero.
For the multiple regression,
let $\mathbf X$ be the $n \times k$ matrix containing the regressors, $\hat {\mathbf u}$ the residual vector and $\mathbf y$ the dependent variable vector. Let $\mathbf M = I_n-\mathbf X(\mathbf X'\mathbf X)^{-1}\mathbf X'$ be the "residual-maker" matrix, called thus because we have
$$\hat {\mathbf u} = \mathbf M\mathbf y$$
It is easily verified that $\mathbf M \mathbf X = \mathbf 0$. Also $\mathbf M$ is idempotent and symmetric.
Now, let $\mathbf i$ be a column vector of ones. Then the sum of residuals is
$$\sum_{i=1}^n \hat u_i = \mathbf i'\hat {\mathbf u} =\mathbf i'\mathbf M\mathbf y = \mathbf i'\mathbf M'\mathbf y = (\mathbf M\mathbf i)'\mathbf y = \mathbf 0' \mathbf y = \mathbf 0$$
So we need the regressor matrix to contain a series of ones, so that we get $\mathbf M\mathbf i = \mathbf 0$.