I'm was going through the chapter on pseudoinverse in intro to linear algebra by Strang, and it says
The vector $x^+=A^+b$ is the shortest possible solution to $A^TA\hat{x}=A^Tb$ Reason: The difference $\hat{x}-x^+$ is in the nullspace of $A^TA$. This is also the nullspace of A, orthogonal to $x^+$.
I get that it is essentially saying $x^+$ is the best least squares solution for $Ax=b$. But I'm having a difficult time understanding the reason provided by Strang.

The Moore-Penrose pseudoinverse $A^+$ satisfies
With this in mind it is easy to see that $x^+=A^+b$ is a solution to $A^TA\hat x=A^Tb$ as $$ A^TAx^+=A^TAA^+b=A^TPb=A^TP^Tb=(PA)^Tb=A^Tb. $$ Moreover, it is the smallest norm solution. Let $z=\hat x-x^+$ for another solution $\hat x$, then $$ A^TAz=A^TA(\hat x-x^+)=A^TA\hat x-A^TAx^+=A^Tb-A^Tb=0. $$ That is, The difference $\hat x−x^+$ is in the nullspace of $A^TA$. Pre-multiply by $z^T$ to get $z^TA^TAz=0$ $\Leftrightarrow$ $\|Az\|^2=0$ $\Leftrightarrow$ $Az=0$. That is This is also the nullspace of $A$. Now $$ z^Tx^+=z^TA^+b=z^TQA^+b=z^TQ^TA^+b=(Qz)^TA^+b=(A^+\underbrace{Az}_{=0})^TA^+b=0. $$ So $x^+\bot z$. That is, orthogonal to $x^+$. Therefore $$ \|\hat x\|^2=\|x^++z\|^2=\|x^+\|^2+2\underbrace{z^Tx^+}_{=0}+\|z\|^2=\|x^+\|^2+\|z\|^2\ge\|x^+\|^2. $$