Errors and Residual

411 Views Asked by At

Why are errors independent but residuals dependent?

As far i know the sum of the residuals within a random sample is necessarily zero, and thus the residuals are necessarily not independent. But also we assume that $\mathbb E(\epsilon)=0$. Why doesn't it imply errors are also not independent?

3

There are 3 best solutions below

0
On

Your model can be biased, hence the errors can sum to something other than 0, while the residuals, as you correctly point out, are constrained.

0
On

Your question is best explained in a broader context: What is the difference between "error" and "residual?"

In regression, residuals are calculated based on a fitted model for which the underlying parameters are estimated from the data we observed, because those underlying parameters are unknown to us. For this reason, residuals are not independent: a constraint is imposed on the model fit to make the estimated parameters uniquely determined (as in the case of ordinary least squares fitting in linear regression).

This speaks to a subtle but important property of residuals: they are in a sense estimates or realizations of error conditional on the assumption that the true error is faithfully represented by the data you observed. Error in a model is intended to capture natural random variation of the response (dependent) variable not explained by the predictors (independent variables). But a residual could be calculated from any model fit and it need not be true to this underlying error.

0
On

The simplest case is this:

$X_1,\ldots,X_n$ are uncorrelated and have expected value $\mu$ and variance $\sigma^2.$

$\overline X = (X_1+\cdots+X_n)/n$ therefore has expected value $\mu$ and variance $\sigma^2/n.$

The errors are $\varepsilon_i = X_i-\mu.$ The sum of the errors has variance $\sigma^2/n,$ so the sum is not zero.

The residuals are $\widehat{\varepsilon\,}_i = X_i - \overline X.$ The sum of these is necessarily zero, so these are negatively correlated.