Thm: Let the minimization problem be: $$ \min_{y \in \mathbb{R}^{n}}{\| Ay - b\|_{2}^{2}} = \| Ax - b\|_{2}^{2}$$
the problem admits a solution if and only if: $$ A^{\mathrm{ T }}Ax = A^{\mathrm{ T }}b$$
with A a tall matrix: $ A \in \mathbb{R}^{m \times n}$
b a vector: $ b \in \mathbb{R}^{m}$
and $m \geq n$
I wanted to demonstrate this theorem and I proceded this way so far:
$$ \| Ay - b\|_{2}^{2} = (Ay - b)^{\mathrm{ T }}(Ay - b) = (y^{\mathrm{ T }}A^{\mathrm{ T }} - b^{\mathrm{ T }})(Ay - b) = y^{\mathrm{ T }}A^{\mathrm{ T }}Ay - y^{\mathrm{ T }}A^{\mathrm{ T }}b - b^{\mathrm{ T }}Ay + b^{\mathrm{ T }}b \tag{*}\label{*} $$
Now calculate the derivative with respect to $y$ and impose it to be equal to zero
This is where I'm having problems.
I tried and this is what I get:
$$ \frac{ d }{ dy }\eqref{*} = 2A^{\mathrm{ T }}Ay - b^{\mathrm{ T }}A - b^{\mathrm{ T }}A $$
imposing it to be eqal to zero:
$$ 2A^{\mathrm{ T }}Ay - b^{\mathrm{ T }}A - b^{\mathrm{ T }}A = 0 $$
I get
$$ A^{\mathrm{ T }}Ax = b^{\mathrm{ T }}A$$
which is different from the Theorem. What am I doing wrong?
Rather than expanding the function into 4 separate terms as your first step, you should collect everything into a single term. Differentiate that, and perform a change-of-variables as you proceed. This approach greatly reduces the visual clutter.
Define $x=(Ay-b).\,\,$ The function is then simple to differentiate $$\eqalign{ f &= x:x \cr df &= 2x:dx \,\,\,= 2x:A\,dy \,\,\,= 2A^Tx:dy \cr \frac{\partial f}{\partial y} &= 2A^Tx \cr }$$ where colon represents the inner/Frobenius product, i.e. $A:B={\rm tr}(A^TB)$. For vector arguments, it is simply the dot product.
Set the gradient to zero to obtain $$\eqalign{ A^Tx = 0 \cr A^T(Ay-b) = 0\cr A^TAy = A^Tb \cr }$$