distribution of residuals in linear regression

47 Views Asked by At

We consider the "common" linear regression model: $$y=\alpha + \beta x + \epsilon \;\;\;\;\mbox{ with } \;\;\; \epsilon \sim N(0, \sigma^2)$$

In all the textbooks I am using, residuals are estimated by defining:

$$S = \frac{y_i - A - Bx_i}{\sqrt{RSS/(n-2)}}$$ where $A$ is the estimator of $\alpha$, $B$ is the estimator of $\beta$ and $RSS = \sum^{n}_{i=1}(y_i - A - B x_i)^2$. Also, authors say that when the linear regression model is correct, $S$ is $\textbf{approximately normal}$.

On the other hand, when considering the predicted response of the model (under the same hypotheses), this second estimator is derived: $$ \frac{y_i - A - B x_i}{\sqrt{ \frac{n+1}{n} + \frac{(x_i - \bar{x})^2}{ \sum_{i=1}^{n}(x_i - \bar{x})^2 } }} \sim {t-Student}_{n-2}$$

Differently from $S$, this second estimator is obtained by considering the real variances of $A$ and $B$. So the question is, should not we rather use the second one? Why do authors prefer $S$ instead?