How do you prove that if $x$ is such that $A^T(b - Ax)=0$ then it minimizes $\| Ax - b \|^2$ without calculus?

122 Views Asked by At

I know this is is a well known fact linear algebra (and feel should be super obvious) but I got a little stuck.

I was to prove that choosing an $x \in \mathbb{R}^K$ such that $A^T(b - Ax) = 0$ is optimal when minimizing $\| b - Ax \|^2$. The intuition is obvious, we want the error vector $b - Ax$ to perpendicular to the column space of A i.e. $$(b - Ax) \perp C(A)$$ but for some reason, the mathematics didn't come out right.

This is what I have:

$$ \min_{x \in \mathbb{R}^K} \|Ax - b \|^2$$

we know:

$\|Ax - b \|^2 = (Ax -b)^T(Ax - b) = \| Ax \|^2 + \| b \|^2 - 2(Ax)^Tb = x^T A^T Ax - 2x^T A^T b + b^Tb$

So it suffices to minimize $x^T A^T Ax - 2x^T A^T b = x^T (A^T A x - 2A^T b)$ since $b^Tb$ is independent of our choice of $x$. Ideally we would want this to be zero:

$$x^T (A^T A x - 2A^T b) = 0$$

which seems to be possible since the minimization doesn't say that $x = 0$ is illegal. Which obviously $x = 0$ has to be wrong (for some reason) and doesn't match the initial intuition that I have.

Even if we didn't choose such an $x$ (which is wrong for some reason) we could also say that $x^T (A^T A x - 2A^T b) = 0$ means that either $x = 0$ or $A^T A x - 2A^T b = 0$. If we choose the second one we get something very close to the desired result:

$A^T A x - 2A^T b = A^T (A x - 2b) = 0$

which is nearly exactly what I wanted (we wanted $A^T(b - Ax) = 0$) but it seems its off by a factor of 2.

I thought the mistake of factor of 2 would have come from saying $ - (Ax)^T b - b^T (Ax) = -(Ax)^T b $ but I think that statement is correct. Or did I make some other mistake somewhere else?

So my issues are:

  1. Why is my reasoning for choosing $x = 0$ is wrong?
  2. Why is my equation $A^T (A x - 2b) = 0$ off by a factor of 2 with $b$.
  3. Even after those issue are resolved, how does one show optimality when $A^T(Ax - b ) = 0$
1

There are 1 best solutions below

0
On

We can first prove a general lemma:

Lemma: Let $b \in \mathbb R^m$ and let $V$ be a subspace of $\mathbb R^m$. Suppose $\hat b \in V$ and $\langle b - \hat b, v \rangle = 0$ for all $v \in V$. Then $\| b - \hat b \|_2 \leq \| b - v \|_2$ for all $ v\in V$.

Proof: Let $v \in V$. Then $\|b - v \|_2^2 = \| b - \hat b + \hat b - v \|_2^2 = \|b - \hat b \|_2^2 + \|\hat b - v \|_2^2 \geq \| b - \hat b \|_2^2.$ (The second equality follows from the Pythagorean theorem, which is applicable because $b - \hat b \perp \hat b - v$.)

Having established this lemma, the intuition you gave is now a rigorous proof. If $A^T(b - Ax^\star) = 0$, then the vector $b - A x^\star$ is orthogonal to every vector in the range of $A$. It follows that $\| b - A x^\star \|_2 \leq \| b - Ax \|_2$ for all $x$.