Why doesn't the linear regression preserve the standard deviation?

79 Views Asked by At

If we model $Y = \beta X$, we can estimate $\beta$ to minimize

$$\sum (Y_i - \beta X_i)^2$$

Taking derivatives and solving for 0, we get $\sum 2\beta X_i^2 - 2Y_1X_i = 0 \implies \beta = \frac{\sum Y_i X_i}{\sum X_i^2}.$

Why does our best fit for $Y = \beta X$ not even satisfy that it has the same variance?

2

There are 2 best solutions below

2
On

Depending on the measured data -- for example it forms a "dot cloud" with a distinct distance to the origin -- it might be reasonable to set as balance line the line from the origin through the dot cloud's barycentre $(\bar{x}, \bar{y})$ instead of a "best fit" according the LS method using vertical distances of the observed points to the resulting line as target figure.

The slope of this line through the origin is $\displaystyle\beta=\frac {\sum_k y_k}{\sum_k x_k}=\frac{\bar{y}}{\bar{x}}$.

(This conforms to what I remember from studies decades ago.)

3
On

There are two issues here. One is that with simple regression (allowing an intercept) you would expect the variance of the fitted values to be less than that of the observations unless there is a perfect fit: indeed this is what originally caused it to be called regression. The other is that forcing the regression line through the origin can lead to peculiar results if that is not the actual relationship.

As an illustration, consider linear regression with the observations $(1,25)$, $(2,21)$, $(3,23)$, $(4,24)$, $(5,22)$.

  • If you allow an intercept, then the fitted line is $\hat y_i = 23.9 -0.3x_i$ and so the variance of the $\hat y_i$ is $R^2=0.09$ times the variance of the $y_i$, largely because the linear relationship appears to to be weak. (See the points and regression line in black below). To get the variance of the $\hat y_i$ up to that of the $y_i$ you would need something like $\hat y_i = 20 +x_i$ or $\hat y_i = 26 - x_i$, both of which would be worse (shown in pink below).

  • If you do not allow an intercept, then the fitted line is $\hat y_i = \frac{342}{55}x_i $ and so the variance of the $\hat y_i$ is over $38$ times the variance of the $y_i$, largely because a line through the origin is nowhere near a sensible model of this data. To get the variance of the $\hat y_i$ down to that of the $y_i$ you would need something like $\hat y_i = x_i$ or $\hat y_i = - x_i$, neither of which go anywhere near the data.

This is illustrated below:

  • the black Os show the points
  • the blue regression line minimises the sum of squares of the residuals
  • the cyan lines (through the means of the data) illustrate what would make the variance of the fitted values match the variance of the original values
  • the red line minimises the sum of squares of the residuals given that the intercept is $0$
  • the pink lines illustrate what would make the variance of the fitted data match the variance of the original data given that the intercept is $0$

The blue line is not a particularly good fit to the data, but the others are clearly worse.

regression lines