Consider a linear regression model, i.e., $Y = \beta_0 + \beta_1 x_i + \epsilon_i$, where $\epsilon_i$ satisfies the classical assumptions. The estimation method of the coefficients $(\beta_0 , \beta)$ is the least-squared method. What would be an intuitive explanation of why the sum of residuals is $0$? I know the way of showing this algebraically, however I cannot seem to understand the concept and intuition behind it. Any explanations?
An intuitive explanation of why the sum of residuals is $0$
3.3k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
A number $x_i$, is equal to the mean of all data $\bar x$ plus its residue $r_i$:
$x_i=\bar x + r_i$
If the sum of all residuals was not $R=0$ then the mean of data contradicts:
$\bar x=\displaystyle \sum x_i /n= \sum (\bar x +r_i)/n = \bar x +n \sum r_i = \bar x +nR \to \bar x = \bar x + R $
$\to R= \sum r_i=0$.
Concept: This means the sum of the distance between values and their mean is zero because otherwise, mean is a noncentral parameter and as a central parameter contradicts.
The mean's intrinsic property is having central tendency and this means it is equal to all data being centered.
On
If you understand it algebraically, then all the rest is merely post-hoc justifications. As such, I'll remind the reason for this feature. Once you've fitted a model with an intercept term, $\beta_0$, taking a derivative of $S(\beta)$ w.r.t. $\beta_0$ yields $$ \frac{\partial}{\partial \beta_0 } S(\beta) = -2\sum (y_i - \hat{\beta}_0 - \hat{\beta}_1) = -2\sum e_i =0. $$ Namely, this feature is just a consequence of fitting a model with an intercept term, $\beta_0$. For a model without an intercept, i.e., $y_i = \beta_1 x_i + \epsilon_i$, the residuals (in general) will not sum to $0$. However, one of the assumptions of a linear model (w\0 an intercept) is that $\mathbb{E}[\epsilon_i|X] = 0$, as such it is a good feature to have $\bar{e}_n = 0$, as the mean of the residuals is the estimator of $\mathbb{E}[\epsilon_i|X]$, which we assume to equal $0$. Basically, you can view a model with an intercept like a model without an intercept with a shifted noise, namely $$ y_i = \beta_1x_i + \epsilon_i, $$ where $\mathbb{E}[\epsilon_i |X] = \beta_0 \neq 0$ and $\operatorname{Var}(\epsilon_i|X_i) = \sigma^2$. In this case if you neglect the estimation of the expected value of the noise term, your estimator of $\beta_1$ will be biased. As such, you can view the intercept as a constant that assures the assumption that the expected value of the noise is zero, which translates to the sum of residuals that is zero and an estimated $\beta_0$ as a "payment" for this assumption.
The residuals should sum to zero. Notice this is the same as the residuals having zero mean. If the residuals did not have zero mean, in effect the average error is not zero in the sample. Thus an easy way to get a better estimate of the desired parameter is to subtract out this average error from our estimate.