Sum of random error in regression

1.8k Views Asked by At

If you know that $\sum_{i=1}^n e_i=0$.What can you say about $\sum_{i=1}^n\epsilon_i=0$? Where $e_i=Y_i-\hat{Y_i}$ and $\epsilon_i=Y_i-E[Y_i]$.

I know that $$Y_i=B_0+B_1X_i+\epsilon_i$$ and $$E[Y_i]=B_0+B_1X_i$$ then I try $$\sum_{i=1}^n\epsilon_i=\sum_{i=1}^n(Y_i-E[\hat{Y_i}+e_i])=\sum_{i=1}^n(Y_i-E[\hat{Y_i}]-E[e_i])=\sum_{i=1}^nY_i-B_0-B_1X_i-e_i$$ since that $$E[\hat{Y_i}]=E[\hat{B_0}+\hat{B_1}X_i+\epsilon_i]=B_0+B_1X_i$$ thus $$\sum_{i=1}^nY_i-B_0-B_1X_i-e_i=\sum_{i=1}^n(B_0+B_1X_i+\epsilon_i-B_0-B_1X_i-e_i)=\sum_{i=1}^n\epsilon_i-e_i$$ $$\sum_{i=1}^n\epsilon_i-e_i=0\Leftrightarrow \sum_{i=1}^n\epsilon_i=\sum_{i=1}^ne_i\Leftrightarrow \sum_{i=1}^n\epsilon_i=0$$ can anyone help?

1

There are 1 best solutions below

1
On BEST ANSWER

begin quote

If you know that $\sum_{i=1}^n e_i=0$, show that $\sum_{i=1}^n\epsilon_i=0$ where $e_i=Y_i-\hat{Y_i}$ and $\epsilon_i=Y_i-E[Y_i]$.

I know that $$Y_i=B_0+B_1X_i+\epsilon_i$$ and $$E[Y_i]=B_0+B_1X_i$$

end quote

Some assumptions are omitted here. Usually one says something like $Y_i = B_0+ B_1 X_i + \varepsilon_i$ where $B_0$, $B_1$, and $X_i$ are not random (or else that $X_i$ are treated as if they were not random because we want to think of distributions as conditional distributions given $X_i$) and $X_i$ and $Y_i$ are observed and one is to estimate $B_0$ and $B_1$, and all this is for $i=1,\ldots,n$.

But there are also assumptions about the distributions of $\varepsilon_i$ for $i=1,\ldots,n$. To make this answer as broadly applicable as it can reasonably be, we won't assume $\varepsilon_i$ are independent, but only that they are uncorrelated, and we won't assume they are identically distributed, but only that they all have expected value $0$ and variance $\sigma^2<\infty$.

The least-squares estimates of $B_0$ and $B_1$ are $\hat B_0$ and $\hat B_1$ and the fitted values are $\hat Y_i = \hat B_0 + \hat B_1 X_i$, and

the $i$th error is $\varepsilon_i$, and

the $i$th residual is $e_i = Y_i - \hat Y_i$ (often called $\hat\varepsilon_i$).

(The residual is an observable estimate of the unobservable error.)

It can be shown to follow from the nature of least-squares estimates that $$ \sum_{i=1}^n e_i = 0 \quad\text{and} \quad \sum_{i=1}^n e_i X_i = 0. $$

However, you cannot prove under the assumptions above that $\displaystyle\sum_{i=1}^n \varepsilon_i = 0$. Notice that $$ \operatorname{var}\left( \sum_{i=1}^n \varepsilon_i \right) = \sum_{i=1}^n \operatorname{var} (\varepsilon_i) = \sigma^2+\cdots+\sigma^2 = n\sigma^2>0. $$ (There are no covariance terms because the $\varepsilon_i$ are uncorrelated.)

It cannot always be $0$ if its variance is positive.