Why doesn't the linear regression preserve the standard deviation?

Question

Why doesn't the linear regression preserve the standard deviation?

79 Views Asked by Bumbble Comm At 31 Mar 2026 - 9:31

If we model $Y = \beta X$, we can estimate $\beta$ to minimize

$$\sum (Y_i - \beta X_i)^2$$

Taking derivatives and solving for 0, we get $\sum 2\beta X_i^2 - 2Y_1X_i = 0 \implies \beta = \frac{\sum Y_i X_i}{\sum X_i^2}.$

Why does our best fit for $Y = \beta X$ not even satisfy that it has the same variance?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2024-03-15 16:10:13

Depending on the measured data -- for example it forms a "dot cloud" with a distinct distance to the origin -- it might be reasonable to set as balance line the line from the origin through the dot cloud's barycentre $(\bar{x}, \bar{y})$ instead of a "best fit" according the LS method using vertical distances of the observed points to the resulting line as target figure.

The slope of this line through the origin is $\displaystyle\beta=\frac {\sum_k y_k}{\sum_k x_k}=\frac{\bar{y}}{\bar{x}}$.

(This conforms to what I remember from studies decades ago.)

**Bumbble Comm** · Answer 2 · 2024-03-16 23:12:37

There are two issues here. One is that with simple regression (allowing an intercept) you would expect the variance of the fitted values to be less than that of the observations unless there is a perfect fit: indeed this is what originally caused it to be called regression. The other is that forcing the regression line through the origin can lead to peculiar results if that is not the actual relationship.

As an illustration, consider linear regression with the observations $(1,25)$, $(2,21)$, $(3,23)$, $(4,24)$, $(5,22)$.

If you allow an intercept, then the fitted line is $\hat y_i = 23.9 -0.3x_i$ and so the variance of the $\hat y_i$ is $R^2=0.09$ times the variance of the $y_i$, largely because the linear relationship appears to to be weak. (See the points and regression line in black below). To get the variance of the $\hat y_i$ up to that of the $y_i$ you would need something like $\hat y_i = 20 +x_i$ or $\hat y_i = 26 - x_i$, both of which would be worse (shown in pink below).
If you do not allow an intercept, then the fitted line is $\hat y_i = \frac{342}{55}x_i $ and so the variance of the $\hat y_i$ is over $38$ times the variance of the $y_i$, largely because a line through the origin is nowhere near a sensible model of this data. To get the variance of the $\hat y_i$ down to that of the $y_i$ you would need something like $\hat y_i = x_i$ or $\hat y_i = - x_i$, neither of which go anywhere near the data.

This is illustrated below:

the black Os show the points
the blue regression line minimises the sum of squares of the residuals
the cyan lines (through the means of the data) illustrate what would make the variance of the fitted values match the variance of the original values
the red line minimises the sum of squares of the residuals given that the intercept is $0$
the pink lines illustrate what would make the variance of the fitted data match the variance of the original data given that the intercept is $0$

The blue line is not a particularly good fit to the data, but the others are clearly worse.

Why doesn't the linear regression preserve the standard deviation?

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions