Difficulty with the derivative of $L_2$ norm

2.5k Views Asked by At

Derivative of L2 Norm


So I was under the impression that the L2 norm squared of a vector x is just 2x, but the example in the screenshot I have linked to says otherwise. What gives? I can't figure out why there's an extra A transpose factor in the result for the derivative.

1

There are 1 best solutions below

0
On

You can use the chain rule for this problem. But for matrix/vector problems the intermediate derivatives required by the chain rule often involve complicated 3rd and 4th order tensors. So my preferred approach is to use successive change-of-variables within differential expressions.

Define the varible $y=Ax+b$. Then the norm (written in terms of the Frobenius product) and its differential are $$\eqalign{ f &= \|y\|_F^2 \cr &= y:y \cr\cr df &= 2y:dy \cr &= 2y:A\,dx \cr &= 2A^Ty:dx \cr }$$ Since $df=\big(\frac{\partial f}{\partial x}:dx\big),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial x} &= 2A^Ty \cr }$$ Note that your initial impression is correct, i.e. with respect to $y$ the gradient is simply $$\eqalign{ df &= 2y:dy \cr \frac{\partial f}{\partial y} &= 2y \cr }$$