interpreting the effect of transpose in the normal equations

941 Views Asked by At

I have a question about the normal equation. $A$ an $m\times n$ matrix with trivial nullspace, $y$ a vector outside the range of $A$. The vector $x$ that minimizes $|| Ax - y ||^2$ is the solution to $A^*Az = A^*y$.

How can I interpret what $A^*$ is doing? Why does multiplying $Ax = y$ by $A^*$ allow for a such a z exist? Is it just because $A^*A$ is invertible?

If it helps answers, I understand the derivation of the normal equation with the following argument. The vector $Az - y$ must be orthogonal. This means that ($Az - y, Ax)$ $=$ $(A^*(Az-y), x)$ = $0$ for all $x$, so $A^*Az = Ay$, where $(\bullet,\bullet)$ is the inner product.

2

There are 2 best solutions below

0
On BEST ANSWER

Here's an interpretation you might like: we can write $$ y = y^{||} + y^{\perp} $$ where $y^{||} = A\hat x$ is the solution to the normal equation $A^*A \hat x = A^*y$, so that $y^{||}$ is the orthogonal projection of $y$ onto the image of $A$. In particular, $y^\perp = (y - A\hat x) = 0$ if and only if the equation $Ax = y$ has a solution.

With that in mind, $y^\perp$ must be in the perpendicular complement of the image of $A$. That is, for all $z \in \Bbb C^n$, we have $$ (Az, y^{\perp}) = 0 \implies (z,A^*y^{\perp}) = 0 $$ since this is true for all $z$, we have $A^*y^{\perp} = 0$. That is, because $y^\perp$ is in the orthogonal complement of the image of $A$, we have $A^*y^\perp = 0$.

So, we can express what $A^*$ "does" to the original equation as follows: $$ Ax = y \implies Ax = y^{||} + y^\perp \implies\\ A^*(A\hat x) = A^*y = A^*y^{||} + A^* y^\perp = A^*y^{||} $$ To convince that this equation is now solvable, I need to convince you that because $Ax = y^{||}$ has a solution, the equation $A^*x = A^*y^{||}$ has precisely the same solution. That is, $$ A \hat x = y^{||} \iff A^*A \hat x = A^*y^{||} $$ which is to say that the image of $A^*A$ is the entire image of $A^*$ and that $A^*A$ has the same kernel as $A$. This is probably a result that is proven in your text.


In a nutshell: multiplying both sides by $A^*$ maps the "error term" $y^\perp$ to $0$, which leaves us with a solvable equation. This gives us a solvable system because the restriction of $A^*$ to the image of $A$ is a surjective map to the image of $A^*$.

0
On

We have $$ q = \lVert Ax - y \rVert_2^2 = \sum_i \left(\sum_j a_{ij} x_j - y_i \right)^2 $$ with the gradient $$ \partial_k q = 2 \sum_i \left( \sum_j a_{ij} x_j - y_i \right) a_{ik} \\ = 2 \left( \sum_{i,j} a_{ij} x_j a_{ik} - \sum_i a_{ik} y_i \right) \\ = 2 \left( \sum_i a_{ik} (A x)_i - (A^\top y)_k \right) \\ = 2 \left( (A^\top A x)_k - (A^\top y)_k \right) $$ which for a critical point and possible extremum means $$ A^\top A x = A^\top y $$