Relating two approaches to least-squares fitting

65 Views Asked by At

I have read about two different approaches to calculating a least-squares fit to points in 2D, and I'm having trouble seeing how these two approaches are related.

So, suppose we have $n$ 2D points $(x_1,y_1), \ldots, (x_n, y_n)$ that we want to fit with a straight line. I want a so-called total least squares fit, i.e. I want to minimize the sums of squares of orthogonal distances to the line.

Let's suppose the data points have already been normalized so that $\sum x_i = 0$ and $\sum y_i = 0$. Then it's pretty easy to show that the best fit line passes through the point $(0,0)$ and so it has the form $Ax + By = 0$, and we just need to find $A$ and $B$.

This paper gives one way to calculate $A$ and $B$. We define $$ a = \sum x_i^2 - \sum y_i^2 \quad\quad ; \quad\quad b = \sum x_iy_i $$ and then $$ A = 2b \quad\quad ; \quad\quad B = - a - \sqrt{a^2 + 4b^2} $$ OK. Cool. Nice and simple.

But a more common approach (like here, for example) is to form the matrix $$ M = \begin{bmatrix} x_1 & \ldots & x_n \\ y_1 & \ldots & y_n \\ \end{bmatrix} $$ and then the normal vector $(A,B)$ of the best fit line is given by an eigenvector of $MM^T$ corresponding to its smallest eigenvalue.

So, my question: the simple formulas for $A$ and $B$ are somehow magically telling me about the eigenvectors of a matrix. How can that be? I don't see the connection at all.

1

There are 1 best solutions below

7
On BEST ANSWER

For the case of a $2 \times 2$ matrix $M$, we have the two facts

$$\lambda_1+\lambda_2= trace(M)$$ and $$\lambda_1\lambda_2= det(M)$$

where $\lambda_1,\lambda_2$ are the two eigenvalues. That's the "magic connection".

There's still work you would have to do to compute the trace and the det. And then you would have to solve for the two eigenvalues from knowing their sum and their product (Hint: eliminate one of them, and you should end up with a quadratic equation in the other. Your formula for $B$ sure looks like an application of the quadratic formula.)

It's a bit messy, but we are able to get those fairly simple formulas in the end. It's only because we're dealing with a $2\times 2$ matrix, where we have fairly nice equations obeyed by the eigenvalues - it would all be considerably more messy for even the $3\times 3$ case.

[Note : I confess that I didn't carry out the math just now to verify that it works exactly how I've described. But I'm pretty sure I did this before, and I'd bet a few bucks that it does.]