Difference between least squares and minimum norm solution

51.4k Views Asked by At

Consider a linear system of equations $Ax = b$.

  • If the system is overdetermined, the least squares (approximate) solution minimizes $||b - Ax||^2$. Some source sources also mention $||b - Ax||$.

  • If the system is underdetermined one can calculate the minimum norm solution. But it does also minimize $||b - Ax||$, or am I wrong?

But if least squares is also a minimum norm, what is the difference, or the rationale of the different naming?

3

There are 3 best solutions below

5
On BEST ANSWER

First, it's important to understand that there are different norms. Most likely you're interested in the euclidean norm:

$\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$

Next, note that minimizing $\| b-Ax \|_{2}^{2}$ is equivalent to minimizing $\| b-Ax \|_{2}$, because squaring the norm is a monotone transform. It really doesn't matter which one you minimize.

If $A$ has full column rank, then there is a unique least squares solution.

However, if $A$ doesn't have full column rank, there may be infinitely many least squares solutions. In this case, we're often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $\| x \|_{2}$. The minimum norm least squares solution is always unique. It can be found using the singular value decomposition and/or the Moore-Penrose pseudoinverse.

4
On

If a system is overdetermined, there is no solution and thus we may want to find $x$ such that $||Ax-b||$ is as small as it can be (as there is no way to make $||Ax-b||=0$).

On the other hand, if the system is underdetermined, there are infinitely many solutions and thus one can find a solution of minimal norm and this is called the minimum norm solution.

5
On

Linear system $$ \mathbf{A} x = b $$ where $\mathbf{A}\in\mathbb{C}^{m\times n}_{\rho}$, and the data vector $b\in\mathbf{C}^{m}$, the solution vector in $x\in\mathbf{C}^{n}$.

Least squares problem

A least squares solution is guaranteed to exist and is defined by $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert \mathbf{A} x - b \rVert_{2}^{2} \text{ is minimized} \right\} $$

Least squares solution

The general least squares problem offers a $\color{blue}{particular}$ solution and a $\color{red}{homogeneous}$ solution. The confusingly named "solution of minimum norm" is just the $\color{blue}{particular}$ solution.

The minimizers are the affine set computed by $$ x_{LS} = \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y}, \quad y \in \mathbb{C}^{n} \tag{1} $$ where vectors are colored according to whether they reside in a $\color{blue}{range}$ space or $\color{red}{null}$ space. (See Laub, 2005, Theorem 8.1, p. 66)

The red dashed line below is the set of the least squares minimizers which appears when there is a row rank deficit $(\rho < m)$. In these cases, the solution is not unique.

hyperplane

Least squares solution of minimum norm

To find the minimizers of the minimum norm, the shortest solution vector, compute the length of the solution vectors.

$$ % \lVert x_{LS} \rVert_{2}^{2} = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} \Big\rVert_{2}^{2} + \Big\lVert \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % $$ The $\color{blue}{range}$ space component is fixed, but we can control the $\color{red}{null}$ space vector. In fact, chose the vector $y$ which forces this term to $0$.

Therefore, the least squares solution of minimum norm is the particular solution $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b}. $$ This is the point where the red dashed line punctures the blue plane. The least squares solution of minimum length is the point in $\color{blue}{\mathit{R}\!\left(\mathbf{A}^{*}\right)}$.

Full column rank

You ask about the case of full column rank where $n=\rho$. In this case, $$ \color{red}{\mathit{N}\left( \mathbf{A} \right)} = \left\{ \mathbf{0} \right\}, $$ the null space is trivial. There is no null space component, and the least squares solution is a point.

In other words, the complete least squares solution is just the $ \color{blue}{particular}$ solution $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b} $$ When the matrix has full column rank, there is no $\color{red}{homogeneous}$ component to the solution. The solution is unique and is a point.