In least squares estimation, why are the residuals constrained to lie within the space defined by the following equations?

Question

In least squares estimation, why are the residuals constrained to lie within the space defined by the following equations?

411 Views Asked by Bumbble Comm At 16 May 2026 - 12:54

I've been reading through the Wikipedia article on degrees of freedom (statistics). There is a section about residuals, in relation to least squares estimation. The article says:

Suppose you have some model $Y_i=a+bx_i + \epsilon_i \text{ for } i=1,...,n$.

Let $\hat a$ and $\hat b$ be least squares estimators of $a$ and $b$.

We can compute the residuals as follows: $\hat e_i=y_i-(\hat a + \hat b x_i)$.

The article then says that these residuals are constrained to lie within the space defined by:

$\hat e_1 + \dots + \hat e_n=0$ and $x_1 \hat e_1 + \dots + x_n \hat e_n=0$.

Hence, they say there are $n-2$ degrees of freedom for error.

So, my first question is, where have these two constraints come from?

I guess the first one comes from the fact that the mean of the residuals is supposed to be $0$. The second one, I am not sure about.

I suppose when they say there are $n-2$ degrees of freedom for error, it means the residuals are constrained to lie within an ($n-2$)-dimensional subspace? Hence, my second question is, why do these constraints mean that the residuals are constrained to an ($n-2$)-dimensional subspace?

Original Q&A

There are 3 best solutions below

user700480 On 13 Feb 2020 - 9:04

You find $\hat{a}$ and $\hat{b}$ by looking for minima of the function (in $a$ and $b$):

$$\sum_{i=1}^n e_i^2=\sum_{i=1}^n(y_i-a-bx_i)^2$$

so taking partial derivatives in $a$ and $b$ and making them equal to $0$ yields:

$$0=\left[\frac{\partial}{\partial a}\sum_{i=1}^n(y_i-a-bx_i)^2\right]_{a=\hat{a},b=\hat{b}}=-2\sum_{i=1}^n(y_i-\hat{a}-\hat{b}x_i)=-2\sum_{i=1}^n \hat{e_i}$$

$$0=\left[\frac{\partial}{\partial b}\sum_{i=1}^n(y_i-a-bx_i)^2\right]_{a=\hat{a},b=\hat{b}}=-2\sum_{i=1}^n(y_i-\hat{a}-\hat{b}x_i)x_i=-2\sum_{i=1}^n \hat{e_i}x_i$$

which gives you the desired properties.

Bumbble Comm On 13 Feb 2020 - 9:05

So, for a given choice of a and b, the sum of the square of the error is by definition $$S=\sum_i(y_i-a-bx_i)^2.$$ So $$\frac{\partial S}{\partial a}=-2\sum(y_i-a-bx_i)$$ and $$\frac{\partial S}{\partial b}=-2\sum x_i(y_i-a-bx_i)$$.

$\hat a$ and $\hat b$ are by definition those values of a and b that set these partial derivatives to zero (minimising S).

So $$\sum(y_i-\hat a- \hat bx_i) = 0$$ and $$\sum x_i(y_i-\hat a- \hat bx_i) = 0,$$ which are respectively equivalent to $$ \sum \hat e_i = 0$$ and $$ \sum \hat e_i x_i = 0,$$ the two constraints in the Wikipedia article.

To answer your second question, imagine the case where n=3. The coordinates $(e_1, e_2, e_3)$ can take any value in 3-dimensional space.

If we impose just the first condition, $e_1+e_2+e_3=0$, that defines a (2-dimensional) plane. If we impose just the second condition (for given fixed $x_i$) $x_1 e_1 + x_2 e_2 + x_3 e_3 = 0$ that defines a plane in $(e_1,e_2,e_3)$-space too (a different one for most choices of $x_i$).

Imposing both conditions, we take the intersection of two planes, which is a 1-dimensional line. So the set of possible $(e_1, e_2, e_3)$ has been constrained to $(n-2)$ dimensions. The same works for higher n, but less easy to visualise.

**Bumbble Comm** · Accepted Answer

When you form a design matrix $Z$ (with $n$ rows and $p$ columns), the column space of $Z$ spans some p-dimensional hyperplane. The residual vector -- formed from the difference between the response vector and its least squares projection onto the column space of $Z$ -- has to be perpendicular to the column space of $Z$. So the residual vector is constrained to lie within the left nullspace of $Z$. The dimension of the left nullspace of $Z$ is $n-p$, hence the residual vector has $n-p$ degrees of freedom.

In the example from the question, $p=2$, hence there are $n-2$ degrees of freedom for error. Furthermore, as stated, a residual vector is perpendicular to the column space. In other words, the residual vector is perpendicular to each column of $Z$. Hence, when you compute $\hat e_i ^T (\text{column of Z})$, this must equal zero. Each of these dot products gives each constraint. The first column of $Z$ is all $1$s, hence the sum of residuals is $0$. Each column after that will have actual data in it, hence each residual is multiplied by a covariant.

In least squares estimation, why are the residuals constrained to lie within the space defined by the following equations?

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions