Proof of the normal equations theorem

551 Views Asked by At

Thm: Let the minimization problem be: $$ \min_{y \in \mathbb{R}^{n}}{\| Ay - b\|_{2}^{2}} = \| Ax - b\|_{2}^{2}$$

the problem admits a solution if and only if: $$ A^{\mathrm{ T }}Ax = A^{\mathrm{ T }}b$$

with A a tall matrix: $ A \in \mathbb{R}^{m \times n}$

b a vector: $ b \in \mathbb{R}^{m}$

and $m \geq n$

I wanted to demonstrate this theorem and I proceded this way so far:

$$ \| Ay - b\|_{2}^{2} = (Ay - b)^{\mathrm{ T }}(Ay - b) = (y^{\mathrm{ T }}A^{\mathrm{ T }} - b^{\mathrm{ T }})(Ay - b) = y^{\mathrm{ T }}A^{\mathrm{ T }}Ay - y^{\mathrm{ T }}A^{\mathrm{ T }}b - b^{\mathrm{ T }}Ay + b^{\mathrm{ T }}b \tag{*}\label{*} $$

Now calculate the derivative with respect to $y$ and impose it to be equal to zero

This is where I'm having problems.

I tried and this is what I get:

$$ \frac{ d }{ dy }\eqref{*} = 2A^{\mathrm{ T }}Ay - b^{\mathrm{ T }}A - b^{\mathrm{ T }}A $$

imposing it to be eqal to zero:

$$ 2A^{\mathrm{ T }}Ay - b^{\mathrm{ T }}A - b^{\mathrm{ T }}A = 0 $$

I get

$$ A^{\mathrm{ T }}Ax = b^{\mathrm{ T }}A$$

which is different from the Theorem. What am I doing wrong?

2

There are 2 best solutions below

0
On BEST ANSWER

Rather than expanding the function into 4 separate terms as your first step, you should collect everything into a single term. Differentiate that, and perform a change-of-variables as you proceed. This approach greatly reduces the visual clutter.

Define $x=(Ay-b).\,\,$ The function is then simple to differentiate $$\eqalign{ f &= x:x \cr df &= 2x:dx \,\,\,= 2x:A\,dy \,\,\,= 2A^Tx:dy \cr \frac{\partial f}{\partial y} &= 2A^Tx \cr }$$ where colon represents the inner/Frobenius product, i.e. $A:B={\rm tr}(A^TB)$. For vector arguments, it is simply the dot product.

Set the gradient to zero to obtain $$\eqalign{ A^Tx = 0 \cr A^T(Ay-b) = 0\cr A^TAy = A^Tb \cr }$$

4
On

As all terms are scalars, you can transpose them.

$$y^TA^Tb+b^TAy=2y^TA^Tb.$$