Is the least-squares solution unique?

2.9k Views Asked by At

I am looking for a line closest to $(-5, -2)$, $(-2, 0)$, $(-1, 0)$, $(2, 3)$, $(5, 4)$ using the least square solution. So I set the line as $$ax+by+c=0$$ let $a=1$ (where $a$ is not $0$ obviously) and got

$$\begin{pmatrix} -2 & 1 \\ 0 & 1 \\ 0 & 1 \\ 3 & 1 \\ 4 & 1\\ \end{pmatrix} \begin{pmatrix} b \\ c \\ \end{pmatrix}= \begin{pmatrix} 5 \\ 2 \\ 1 \\ -2 \\ -5\\ \end{pmatrix} $$

Then I solved it by multiplying $A^T$on the both sides.

But the $(b, c)$ I got here was different from that of the usual solution using $$y=ax+b$$ Although the two lines are almost identical (which implies I am not that wrong), they are still different. What's the matter?

2

There are 2 best solutions below

6
On BEST ANSWER

I think the reason we get different solutions here is because we're measuring different squares.

In the first equation, we want to minimize the distance from the vector on the right side made up of x-coordinates to the range of the matrix on the left side made up of y-coordinates and constants.

In the second equation, our vector has y-coordinates and we want to minimize the distance to the range of the matrix made up of x-coordinate and constants.

Yes, we're approximating the same data set with the same purpose in mind, so we get similar points, but we're working with different vectors and matrices in order to do that, and that gives us different least squares metrics to approximate this line with, meaning that we're going to get different answers because we're using different metrics, but similar answers because we're still approximating the same data set.

0
On

Imagine you have two data points $(2,4)$ and $(2,6)$. Now, if we put the slope formula into matrix form, we will have \begin{pmatrix}2 & 1 \\2 & 1 \\\end{pmatrix} as our $X$ matrix. First column is for the input $X$ and second for the slope "$b$". We set "$b$" as one because we want to give it freedom to move (otherwise, $0$ would kill the "$b$" out of the formula, $mx = y$). Then, we have the vector $(M ,B)$ for our unknown coefficients. These two matrices multiplied will give the output vector $(4 ,6)$.

Now, try to think of how you would graph through these two data points. The problem is that there are infinite solutions depending on what you set the bias "$b$" to be equal to. Why do we have infinite least square solutions? Well, The column space is not full rank and we have a dependency!

This is the most intuitive way to think in my opinion. Once you understand in 2d, you just "rely" it will hold on to $n$ dimensions (and there is no reason to not hold).