Why the sum of residuals equals 0 when we do a sample regression by OLS?

Question

Why the sum of residuals equals 0 when we do a sample regression by OLS?

145.3k Views Asked by Bumbble Comm At 07 Apr 2026 - 12:49

That's my question, I have looking round online and people post a formula by they don't explain the formula. Could anyone please give me a hand with that ? cheers

Original Q&A

There are 4 best solutions below

Bumbble Comm On 16 Oct 2013 - 1:35

take the estimated values from the line of best fit and use these y values to subtract from the original y values then add them up. if it is a good line of best fit then it should approach zero, but bad lines of best fit will be much less or more than zero

Bumbble Comm On 14 Dec 2017 - 6:11

The accepted solution by Alecos Papadopoulos has a mistake at the end. I can't comment so I will have to submit this correction as a solution, sorry.

It's true that a series of ones would do the job. But it's not true that we need it. We do not need the regressor to have a series of ones in order for $Mi = 0$.

Theorem: If $\exists$ a $p$ x $1$ vector $v$ such that: $$Xv = 1_n$$

where $1_n$ is a $n$ x $1$ vector of ones, then $$\sum_{i=1}^ne_i=0$$ Proof: $\sum_{i=1}^ne_i= e^T1_n =e^T X v = (e^T X) v=(X^Te)^T v = (0)^T v = 0 $

Above I am using the fact that $X^Te=0$. Having a series of ones in X (a.k.a. intercept) is just a special case of $v$. If the intercept is in the first column $v$ would look like this $[1,0,0,0,0,0...]$

Bumbble Comm On 04 Mar 2021 - 6:52

I want to provide a more general answer from the statistical sense of the word "residual". I figured this out in my quest to understand degrees of freedom and Bessels's correction in statistics.

A residual in statistics means the difference between a variable's value and the sample mean (not the true, usually unknowable average).

So if $x_i, i \in \{1, ..., N\}$ represents a sample value:

$$ \sum_i r_i = \sum_i x_i - \mu \\ = \sum_i (x_i - \frac{1}{N}\sum_i x_i) \\ = \sum_i x_i - N \frac{1}{N}\sum_i x_i \\ = 0 $$

Which is the basis behind the argument behind Bessel's Correction, which is the practice of dividing the sum of squared residuals by $N-1$ rather than $N$.

$$ \sigma^2 = \frac{1}{N - 1}\sum(x_i - \mu)^2$$

The idea is (according to Wikipedia) that the residuals are not independent because they sum to zero, therefore you subtract one. I do not actually understand this statement (how do we know that the span of the residuals is $N-1$ and not $N-2$ for example?). However, this nice proof explains the intuition behind the correction from a functional/applied point of view.

**Bumbble Comm** · Accepted Answer

If the OLS regression contains a constant term, i.e. if in the regressor matrix there is a regressor of a series of ones, then the sum of residuals is exactly equal to zero, as a matter of algebra.

For the simple regression,
specify the regression model $$y_i = a +bx_i + u_i\,,\; i=1,...,n$$

Then the OLS estimator $(\hat a, \hat b)$ minimizes the sum of squared residuals, i.e.

$$(\hat a, \hat b) : \sum_{i=1}^n(y_i - \hat a - \hat bx_i)^2 = \min$$

For the OLS estimator to be the argmin of the objective function, it must be the case as a necessary condition, that the first partial derivatives with respect to $a$ and $b$, evaluated at $(\hat a, \hat b)$ equal zero. For our result, we need only consider the partial w.r.t. $a$:

$$\frac {\partial}{\partial a} \sum_{i=1}^n(y_i - a - bx_i)^2 \Big |_{(\hat a, \hat b)} = 0 \Rightarrow -2\sum_{i=1}^n(y_i - \hat a - \hat bx_i) = 0 $$

But $y_i - \hat a - \hat bx_i = \hat u_i$, i.e. is equal to the residual, so we have that

$$\sum_{i=1}^n(y_i - \hat a - \hat bx_i) = \sum_{i=1}^n\hat u_i = 0 $$

The above also implies that if the regression specification does not include a constant term, then the sum of residuals will not, in general, be zero.

For the multiple regression,
let $\mathbf X$ be the $n \times k$ matrix containing the regressors, $\hat {\mathbf u}$ the residual vector and $\mathbf y$ the dependent variable vector. Let $\mathbf M = I_n-\mathbf X(\mathbf X'\mathbf X)^{-1}\mathbf X'$ be the "residual-maker" matrix, called thus because we have

$$\hat {\mathbf u} = \mathbf M\mathbf y$$

It is easily verified that $\mathbf M \mathbf X = \mathbf 0$. Also $\mathbf M$ is idempotent and symmetric.

Now, let $\mathbf i$ be a column vector of ones. Then the sum of residuals is

$$\sum_{i=1}^n \hat u_i = \mathbf i'\hat {\mathbf u} =\mathbf i'\mathbf M\mathbf y = \mathbf i'\mathbf M'\mathbf y = (\mathbf M\mathbf i)'\mathbf y = \mathbf 0' \mathbf y = \mathbf 0$$

So we need the regressor matrix to contain a series of ones, so that we get $\mathbf M\mathbf i = \mathbf 0$.

Why the sum of residuals equals 0 when we do a sample regression by OLS?

There are 4 best solutions below

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions