Explain why $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$

Question

Explain why $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$

1.5k Views Asked by Bumbble Comm At 17 May 2026 - 4:43

I'm was going through the chapter on pseudoinverse in intro to linear algebra by Strang, and it says

The vector $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$ Reason: The difference $\hat{x}-x^+$ is in the nullspace of $A^TA$. This is also the nullspace of A, orthogonal to $x^+$.

I get that it is essentially saying $x^+$ is the best least squares solution for $Ax=b$. But I'm having a difficult time understanding the reason provided by Strang.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 23 Nov 2016 - 9:25

At this point in the book, Strang has established that $A A^+$ projects onto the range of $A$.

It follows that if $x^+ = A^+ b$ then $$ A^T A x^+ = A^T A A^+ b = A^T \tilde b $$ where $\tilde b$ is the projection of $b$ onto the range of $A$.

Now, Strang has claimed that if $A^T A \hat x = A^T b$ then $\hat x - x^+$ is in the null space of $A^T A$, which is equivalent to claiming that $$ \underbrace{A^T A x^+}_{A^T \tilde b} = \underbrace{A^T A \hat x}_{A^T b}. $$ Is it really true that $A^T \tilde b = A^T b$?

To see that this is true, decompose $b$ as $$ b = \tilde b + \bar b, $$ where $\bar b$ is the projection of $b$ onto the null space of $A^T$. (This decomposition uses the fact that the range of $A$ is the orthogonal complement of the null space of $A^T$.) We see that $$ A^T b = A^T \tilde b + A^T \bar b = A^T \tilde b. $$ This proves the claim mentioned above.

To finish off the problem, note that $x^+$ belongs to the range of $A^T$, and $\hat x - x^+$ belongs to the null space of $A$. Hence, $$ x^+ \perp \hat x - x^+. $$ It now follows from the Pythagorean theorem that $$ \| \hat x \|^2 = \| x^+ \|^2 + \| \hat x - x^+ \|^2. $$ This shows that $\| \hat x \| \geq \| x^+ \|$.

Bumbble Comm On 29 Mar 2017 - 2:06

There are good answers posted. Here is another perspective.

Start with the matrix $\mathbf{A}\in\mathbb{C}^{m\times n}$, the data vector $b\in\mathbb{C}^{m}$ which has a range space component. We are looking for a solution vector $x\in\mathbb{C}^{n}$. We are talking about the linear system $\mathbf{A} x = b$. Specifically, the least squares problem where $\mathbf{A}x -b > 0$. We are discussing the solution to the least squares problem $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert Ax - b \rVert_{2}^{2} \text{ is minimized} \right\} $$ This set of minimizers is given by $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y}, \quad y \in \mathbb{C}^{n} $$ where the coloring distinguishes $\color{blue}{range}$ space and $\color{red}{null}$ space vectors. The least squares minimizers are the affine space represented by the dashed red line.

The need for least squares arises because the data vector is not confined the range space: $$ b = \color{blue}{b_{\mathcal{R}}} + \color{red}{b_{\mathcal{N}}} $$

What are the lengths of the vectors in this set? $$ \begin{align} \lVert x_{LS} \rVert_{2}^{2} &= \lVert \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y} \rVert_{2}^{2} \\ % &= \lVert \color{blue}{\mathbf{A}^{\dagger}b} \rVert_{2}^{2} + \lVert \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger} \mathbf{A} \right) y} \rVert_{2}^{2} \end{align} $$ How to minimize this length? Adjust $y$ so that there is no null space contribution. That is, the least squares minimizers of least length is $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{\dagger}b}. $$

**Bumbble Comm** · Accepted Answer

The Moore-Penrose pseudoinverse $A^+$ satisfies

$P=AA^+$ is the orthogonal projector, i.e. $PA=A$ and $P=P^T$, and
$Q=A^+A$ is the orthogonal projector, i.e. $QA^+=A^+$ and $Q=Q^T$.

With this in mind it is easy to see that $x^+=A^+b$ is a solution to $A^TA\hat x=A^Tb$ as $$ A^TAx^+=A^TAA^+b=A^TPb=A^TP^Tb=(PA)^Tb=A^Tb. $$ Moreover, it is the smallest norm solution. Let $z=\hat x-x^+$ for another solution $\hat x$, then $$ A^TAz=A^TA(\hat x-x^+)=A^TA\hat x-A^TAx^+=A^Tb-A^Tb=0. $$ That is, The difference $\hat x−x^+$ is in the nullspace of $A^TA$. Pre-multiply by $z^T$ to get $z^TA^TAz=0$ $\Leftrightarrow$ $\|Az\|^2=0$ $\Leftrightarrow$ $Az=0$. That is This is also the nullspace of $A$. Now $$ z^Tx^+=z^TA^+b=z^TQA^+b=z^TQ^TA^+b=(Qz)^TA^+b=(A^+\underbrace{Az}_{=0})^TA^+b=0. $$ So $x^+\bot z$. That is, orthogonal to $x^+$. Therefore $$ \|\hat x\|^2=\|x^++z\|^2=\|x^+\|^2+2\underbrace{z^Tx^+}_{=0}+\|z\|^2=\|x^+\|^2+\|z\|^2\ge\|x^+\|^2. $$

Explain why $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$

There are 3 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in SELF-LEARNING

Related Questions in LEAST-SQUARES

Related Questions in PSEUDOINVERSE

Trending Questions

Popular # Hahtags

Popular Questions