Let A be an $m \times n $ matrix where $m \gt n$, $x \in \Bbb R^n, b\in \Bbb R^m$. Denote $|| \cdot || $ as $L^2$-norm
let $f(x)=||Ax-b||^2_2$
then $\nabla f(x)=A^TAx-A^Tb$
I have trouble in understanding why it is the case
Thank you!
Let A be an $m \times n $ matrix where $m \gt n$, $x \in \Bbb R^n, b\in \Bbb R^m$. Denote $|| \cdot || $ as $L^2$-norm
let $f(x)=||Ax-b||^2_2$
then $\nabla f(x)=A^TAx-A^Tb$
I have trouble in understanding why it is the case
Thank you!
On
We can use the multivariable chain rule. Note that $f(x) = g(h(x))$ where $h(x) = Ax - b$ and $g(y) = \| y \|_2^2$. From the chain rule, \begin{align} f'(x) &= g'(h(x)) h'(x) \\ &= 2(Ax - b)^TA . \end{align} It follows that $$ \nabla f(x) = f'(x)^T = 2 A^T(Ax - b). $$
In the above derivation, we used the fact that $h'(x) = A$ (here we are using the standard formula for the derivative of a linear transformation) and we also used the fact that $g'(y) = 2y^T$, which is easy to show.
Ok, to finish what I was talking about in the comments, notice that $x^TA^TAh$ and $b^TAh$ are real numbers and are so equal to their transpose. This gives us $$f(x+h)-f(x) = 2(h^T(A^TAx -A^Tb)) + h^TA^TAh.$$ From this and rearranging we can conclude that $$\frac{||f(x+h)-f(x) - 2(h^T(A^TAx -A^Tb))||_2}{||h||_2}\to 0 $$ as $h\to 0$. Hence $\nabla f=2(A^TAx -A^Tb)$.
Alternatively, $\frac{\partial }{\partial x_i}(Ax-b) = Ae_i$. Therefore by the chain rule, $$\frac{\partial d}{dx_i}(f) = (\frac{\partial }{\partial x_i} (Ax-b))^T(Ax-b) + (Ax-b)^T(\frac{\partial }{\partial x_i}(Ax-b)) = 2e_i^T(A^TAx-A^Tb). $$ Using similar reasoning to the above. Hence $\nabla f=2(A^TAx -A^Tb)$.