Explain why $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$

1.5k Views Asked by At

I'm was going through the chapter on pseudoinverse in intro to linear algebra by Strang, and it says

The vector $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$ Reason: The difference $\hat{x}-x^+$ is in the nullspace of $A^TA$. This is also the nullspace of A, orthogonal to $x^+$.

I get that it is essentially saying $x^+$ is the best least squares solution for $Ax=b$. But I'm having a difficult time understanding the reason provided by Strang.

3

There are 3 best solutions below

2
On BEST ANSWER

The Moore-Penrose pseudoinverse $A^+$ satisfies

  1. $P=AA^+$ is the orthogonal projector, i.e. $PA=A$ and $P=P^T$, and
  2. $Q=A^+A$ is the orthogonal projector, i.e. $QA^+=A^+$ and $Q=Q^T$.

With this in mind it is easy to see that $x^+=A^+b$ is a solution to $A^TA\hat x=A^Tb$ as $$ A^TAx^+=A^TAA^+b=A^TPb=A^TP^Tb=(PA)^Tb=A^Tb. $$ Moreover, it is the smallest norm solution. Let $z=\hat x-x^+$ for another solution $\hat x$, then $$ A^TAz=A^TA(\hat x-x^+)=A^TA\hat x-A^TAx^+=A^Tb-A^Tb=0. $$ That is, The difference $\hat x−x^+$ is in the nullspace of $A^TA$. Pre-multiply by $z^T$ to get $z^TA^TAz=0$ $\Leftrightarrow$ $\|Az\|^2=0$ $\Leftrightarrow$ $Az=0$. That is This is also the nullspace of $A$. Now $$ z^Tx^+=z^TA^+b=z^TQA^+b=z^TQ^TA^+b=(Qz)^TA^+b=(A^+\underbrace{Az}_{=0})^TA^+b=0. $$ So $x^+\bot z$. That is, orthogonal to $x^+$. Therefore $$ \|\hat x\|^2=\|x^++z\|^2=\|x^+\|^2+2\underbrace{z^Tx^+}_{=0}+\|z\|^2=\|x^+\|^2+\|z\|^2\ge\|x^+\|^2. $$

0
On

At this point in the book, Strang has established that $A A^+$ projects onto the range of $A$.

It follows that if $x^+ = A^+ b$ then $$ A^T A x^+ = A^T A A^+ b = A^T \tilde b $$ where $\tilde b$ is the projection of $b$ onto the range of $A$.

Now, Strang has claimed that if $A^T A \hat x = A^T b$ then $\hat x - x^+$ is in the null space of $A^T A$, which is equivalent to claiming that $$ \underbrace{A^T A x^+}_{A^T \tilde b} = \underbrace{A^T A \hat x}_{A^T b}. $$ Is it really true that $A^T \tilde b = A^T b$?

To see that this is true, decompose $b$ as $$ b = \tilde b + \bar b, $$ where $\bar b$ is the projection of $b$ onto the null space of $A^T$. (This decomposition uses the fact that the range of $A$ is the orthogonal complement of the null space of $A^T$.) We see that $$ A^T b = A^T \tilde b + A^T \bar b = A^T \tilde b. $$ This proves the claim mentioned above.

To finish off the problem, note that $x^+$ belongs to the range of $A^T$, and $\hat x - x^+$ belongs to the null space of $A$. Hence, $$ x^+ \perp \hat x - x^+. $$ It now follows from the Pythagorean theorem that $$ \| \hat x \|^2 = \| x^+ \|^2 + \| \hat x - x^+ \|^2. $$ This shows that $\| \hat x \| \geq \| x^+ \|$.

2
On

There are good answers posted. Here is another perspective.

Start with the matrix $\mathbf{A}\in\mathbb{C}^{m\times n}$, the data vector $b\in\mathbb{C}^{m}$ which has a range space component. We are looking for a solution vector $x\in\mathbb{C}^{n}$. We are talking about the linear system $\mathbf{A} x = b$. Specifically, the least squares problem where $\mathbf{A}x -b > 0$. We are discussing the solution to the least squares problem $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert Ax - b \rVert_{2}^{2} \text{ is minimized} \right\} $$ This set of minimizers is given by $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y}, \quad y \in \mathbb{C}^{n} $$ where the coloring distinguishes $\color{blue}{range}$ space and $\color{red}{null}$ space vectors. The least squares minimizers are the affine space represented by the dashed red line.

enter image description here

The need for least squares arises because the data vector is not confined the range space: $$ b = \color{blue}{b_{\mathcal{R}}} + \color{red}{b_{\mathcal{N}}} $$

What are the lengths of the vectors in this set? $$ \begin{align} \lVert x_{LS} \rVert_{2}^{2} &= \lVert \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y} \rVert_{2}^{2} \\ % &= \lVert \color{blue}{\mathbf{A}^{\dagger}b} \rVert_{2}^{2} + \lVert \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y} \rVert_{2}^{2} \end{align} $$ How to minimize this length? Adjust $y$ so that there is no null space contribution. That is, the least squares minimizers of least length is $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{\dagger}b}. $$