Recently in my lectures we did Householder reflectors and normal equations to solve $Ax = b$, with $A$ being a rectangular $m\times n$ matrix and $x$ being a $m\times 1$ vector, where $m>n$. Or maybe more accurately to minimize the norm of the residual $r = Ax -b$.
Now I know how to do this, but I lack intution on why this works and why simple projecting doesn't work, i.e. let's say $A$ has columns $a_1, a_2, ... a_n$, then to find the least squares problem why isn't it enough to let $x_i = \langle\,a_i,b\rangle \frac{1}{\left\|a_i\right\|^2}$ for $i = 1, 2,... m$?
I see I am doing something similar through normal equations i.e. multiplying both sides by $A^t.$
I am again dot producting columns of $A$ with $b,$ and then $(A^tA)^{-1}$ seems to be the normalization, but it doesn't work out to be the same.