Why is vertical distance the standard in least squares?

1.8k Views Asked by At

When fitting a curve in $\mathbb R^2$ to data points in $\mathbb R^2$ (example), why is each point's vertical distance from the curve squared instead of its shortest (possibly diagonal) distance from the curve?

Ignoring my poorly drawn curves, it seems obvious that https://i.imgur.com/vy6ioMg.png is a worse curve-to-point fit than https://i.imgur.com/9m22M8p.png even though the red line is shorter in the first image, because you can draw the much shorter blue line instead (which I labeled b in the second image). Minimizing $b^2$ seems much more important than minimizing $a^2$.

3

There are 3 best solutions below

0
On BEST ANSWER

In fact, diagonal distance is used in some cases.

From the operative point of view, the standard "vertical" distance is taken when it is assumed that in the 2D data available, the $x$ has been measured (almost) exactly, while the $y$ is subject to error.

When both $x$ and $y$ are subject to error, under the usual assumptions of independence, near Gaussian distribution, etc. the distance shall be taken along a line with slope $\sigma_y / \sigma_x$.

That's called Total least squares.

1
On

The idea behind many examples of least-squares fit is to use it as a predictive model: given some points $\{(x_1,y_1), \dots, (x_n,y_n)\}$, you find a curve that takes an input $x$ and gives you a predicted value of $f(x)$.

In this case, the vertical distance $|y_1 - f(x_1)|$ is the difference between the actual known value $y_1$ and the predicted value $f(x_1)$. (And $(y_1 - f(x_1))^2$ is the squared error.) This is the actual measurement of error we care about, because it's a prediction of how far off we'll be if we guess $f(x)$ for a different value $x$.

1
On

Your data points are $(x_i, y_i)$ and your regression line is $y=mx+b.$

The absolute error in estimation at each point is $|y_i -mx_i-b|$ which is the vertical distance from your data point to your regression line.