I have a question about the normal equation. $A$ an $m\times n$ matrix with trivial nullspace, $y$ a vector outside the range of $A$. The vector $x$ that minimizes $|| Ax - y ||^2$ is the solution to $A^*Az = A^*y$.
How can I interpret what $A^*$ is doing? Why does multiplying $Ax = y$ by $A^*$ allow for a such a z exist? Is it just because $A^*A$ is invertible?
If it helps answers, I understand the derivation of the normal equation with the following argument. The vector $Az - y$ must be orthogonal. This means that ($Az - y, Ax)$ $=$ $(A^*(Az-y), x)$ = $0$ for all $x$, so $A^*Az = Ay$, where $(\bullet,\bullet)$ is the inner product.
Here's an interpretation you might like: we can write $$ y = y^{||} + y^{\perp} $$ where $y^{||} = A\hat x$ is the solution to the normal equation $A^*A \hat x = A^*y$, so that $y^{||}$ is the orthogonal projection of $y$ onto the image of $A$. In particular, $y^\perp = (y - A\hat x) = 0$ if and only if the equation $Ax = y$ has a solution.
With that in mind, $y^\perp$ must be in the perpendicular complement of the image of $A$. That is, for all $z \in \Bbb C^n$, we have $$ (Az, y^{\perp}) = 0 \implies (z,A^*y^{\perp}) = 0 $$ since this is true for all $z$, we have $A^*y^{\perp} = 0$. That is, because $y^\perp$ is in the orthogonal complement of the image of $A$, we have $A^*y^\perp = 0$.
So, we can express what $A^*$ "does" to the original equation as follows: $$ Ax = y \implies Ax = y^{||} + y^\perp \implies\\ A^*(A\hat x) = A^*y = A^*y^{||} + A^* y^\perp = A^*y^{||} $$ To convince that this equation is now solvable, I need to convince you that because $Ax = y^{||}$ has a solution, the equation $A^*x = A^*y^{||}$ has precisely the same solution. That is, $$ A \hat x = y^{||} \iff A^*A \hat x = A^*y^{||} $$ which is to say that the image of $A^*A$ is the entire image of $A^*$ and that $A^*A$ has the same kernel as $A$. This is probably a result that is proven in your text.
In a nutshell: multiplying both sides by $A^*$ maps the "error term" $y^\perp$ to $0$, which leaves us with a solvable equation. This gives us a solvable system because the restriction of $A^*$ to the image of $A$ is a surjective map to the image of $A^*$.