How come least square can have many solutions?

11.7k Views Asked by At

I know there always exists a least-square solution $\hat{x}$, regardless of the properties of the matrix $A$. However, I keep finding online that least-square can have infinitely many solutions, if $A$ is not full column rank.

Shouldn't $\hat{x}$ always be unique, as the minimization of a quadratic function (the error) always yields a global minima/maxima? Therefore, regardless of what the matrix $A$ is (even if it is a badly constructed matrix with dependent columns), least-square should find a single 'best' solution $\hat{x}$?

Is there an easy (or intuitive) proof showing why would the least-square method produce infinitely many solutions if there are dependent columns?

4

There are 4 best solutions below

0
On BEST ANSWER

Your least squares solution is minimizing $\hat x^T A \hat x$ If $A$ does not have full rank, there is some vector $y$ such that $Ay=0$. Then $(\hat x+y)^TA(\hat x+y)=\hat x^T A \hat x$ so you can add any multiple of $y$ to your solution and get the same product.

0
On

Suppose that $\exists y \in Ker(A) / y \neq 0$. Now, let $\hat{x}$ be a least squares solution. That is: $\hat{x}$ is such that $||A\hat{x} - b||_2 = min_{x \in V} ||Ax - b||_2$. Notice that $A(\hat{x} + y) = A\hat{x} + Ay = A \hat{x}$, hence $||A\hat{x} - b||_2 = ||A(\hat{x} + y) - b||_2$.

This means that you have just created another solution that also has the minimum possible norm, provided that $Ker(A) \neq \{0\}$, which only happens if A is not full rank.

0
On

for what it is worth, the least squares solution to $$ H \hat{x} = z $$ is considered by multiplying on the left by $H^t,$ giving $$ H^t H \hat{x} = H^t z. $$ The matrix $H^t H$ is square and symmetric, indeed positive semidefinite. It is called the Gramian of the system in applications.

If the Gramian is nonsingular, therefore invertible, there is a single answer to the original problem given by $$ \hat{x} = \left( H^t H \right)^{-1} H^t z $$

The unknown variables are said to be observable if all this happens, which is the case if and only if the columns of $H$ are linearly independent.

From Kalman Filtering: Theory and Practice by Grewal and Andrews

0
On

Nice property is to add constraint of the least norm of all solutions.
Then, if $ A $ is full rank there is one solution (Hence unique) while in the case $ A $ isn't full rank still there is one unique solution.

The great thing is that the Pseudo Inverse of $ A $ always yields both the Least Squares and Least Norm solution.