I'm following the text on Scientic Computing by Michael Heath and they state $\left[\operatorname{cond}(A)\right]^2 = \|{A}\|^2\|\left(A^TA\right)^{-1}\|.$
I know for rectangular matrices $\operatorname{cond}(A) = \|A\| \|A^+\|,$ where $A^+ = \left(A^TA\right)^{-1}A^T$ so I am not sure where the derivation above comes from.
Why is it not $$[\operatorname{cond}(A)]^2 = \|A\|^2 \|A^+\|^2 = \|A\|^2 \|\left(A^TA\right)^{-1}A^T\|^2?$$
Is there another identity I am missing?
The two expressions are actually the same. To see why, suppose $A = U \Sigma V^T$ is the SVD of $A$. Then (using the unitary invariance of the norm):
$$ \| (A^{T} A)^{-1} \| = \| V \Sigma^{-2} V^{T}\| = \| \Sigma^{-2} \| = \frac{1}{\sigma_{\min}^2(A)}, $$
whereas the expression you are thinking about yields (again using the unitary invariance of the norm):
$$ \| (A^{T} A)^{-1} A^{T} \|^2 = \| V \Sigma^{-2} V^{T} V \Sigma U^{T}\|^2 = \| V \Sigma^{-2} \Sigma U^{T} \|^2 = \| V \Sigma^{-1} U^{T} \|^2 = \|\Sigma^{-1}\|^2 \\ = \left(\frac{1}{\sigma_{\min}(A)}\right)^2, $$ which is the same expression. Here, $\sigma_{\min}(A)$ denotes the smallest nonzero singular value of $A$.