I read this thread talking about 'why we use least squares' for curve fitting
Why do we use a Least Squares fit?
One answer by Chris Taylor begins with the assumption that we should look for
$$ y_i=ax_i+e_i $$
This reference:
http://www.bradthiessen.com/html5/docs/ols.pdf
also supports Chris' choice and states that "We could measure distance from the points to the line horizontally, perpendicularly, or vertically. Since we oftentimes use regression to predict values of Y from observed values of X, we choose to measure the distance vertically."
But would it not be better to measure the 'perpendicular distance'? For example, if we assume that our 'fitted line' is as above, then the perpendicular distance, $\Delta$ from a predicted point $(x_i, y_i)$ to an actual data point $(x_0, y_0)$ would be $\Delta^2 = (x_i - x_0)^2 + (y_i - y_0)^2$
so the error, E, would be $$ E = \sum_{i=1}^n \Delta_i $$ This is where I get a little confused. I know that
$$ y_i=ax_i+e_i $$ $$ x_i=(y_i-e_i)/a $$
so substituting for $(\Delta_i)^2$
$$ E = \sum_{i=1}^n ((\frac{y_i-e_i}{a}) - x_0)^2 + ((ax_i+e_i) - y_0)^2$$
Then I differentiate with respect to a and set that derivative equal to 0? Not only am I getting lost here but my Latex skills are failing. The equation gets pretty complicated but it can be calculated. The question is, 'is it more accurate to use perpendicular distance rather than vertical distance'?
The reason for using vertical distance is that often you have a knob you turn that controls the important parameter of the experiment and then you measure the output. We believe you can set this parameter exactly (or so close that any error of the point is all in the measurement of the $y$ value, not the $x$ value). This is appropriate as long as the error in $x$ is small compared to the error in $y$ divided by $\frac {dy}{dx}$. This is often true, but not always.