In the least squares method, what does $A^T A$ indicate and, similarly, the product $A ^T b$?
That is, why do we multiply both sides of the equation $Ax = b$ by $A^T$? What does it tell us?
I know the derivation, but I'm looking for an intuitive explanation of the normal equations $$ ^=^b. $$
Visually, if $Ax$ is the vector in $R(A)$ which is as close as possible to $b$, then the residual $b - Ax$ is orthogonal to $R(A)$. Since $R(A)$ is spanned by the columns of $A$, $b - Ax$ is orthogonal to each column of $A$. It follows that $$ \tag{1} A^T(b-Ax) = 0, $$ which implies that $A^T Ax = A^T b$.
In summary, the visual meaning of equation (1) is that the residual $b - Ax$ is orthogonal to the column space of $A$.