Derivative of Vector - Matrix product of Euclidean Norm

90 Views Asked by At

Suppose we need to find the derivative of $$ \dfrac{d||(Xw)^T||_2}{dw} $$ where X is an $n \times m$ matrix and $w$ is of dimension $m \times 1$.

I know I need to apply the chain rule but I am confused on how to work when I need to work with both a norm and the transpose. Of course I can rewrite it as $$ \dfrac{d||w^TX^T||_2}{dw} $$ but then I get $$ \dfrac{d||w^TX^T||_2}{dw} = \dfrac{d(||w^TX^T||_2)}{d(w^TX^T)} \dfrac{d(w^TX^T)}{dw} = \dfrac{Xw}{||w^TX^T||_2} X^T$$

Which is obviously wrong because the multiplication $XwX^T$ is impossible. Where am I going wrong?

2

There are 2 best solutions below

0
On BEST ANSWER

Restate the problem $$\eqalign{ y &= Xw & \quad({\rm a\,convenient\,vector}) \\ \phi &= \|y^T\|_2 &= \|y\|_2 \quad({\rm the\,function}) \\ \phi^2 &= \|y\|^2_2 &= y^Ty \quad(\ldots{\rm squared}) \\ }$$ Starting with the squared function, calculate the differential, then the gradient. $$\eqalign{ 2\phi\,d\phi &= 2y^Tdy \;=\; 2y^TX\,dw \;=\; 2(X^Ty)^Tdw \\ d\phi &= \left(\frac{X^Ty}{\phi}\right)^Tdw \\ \frac{\partial\phi}{\partial w} &= \frac{X^Ty}{\phi} = \frac{X^TXw}{\|Xw\|_2} \\ }$$

0
On

Your problem is that the chain rule is a lot more complicated with quantities that don't commute, so let's do something different. As we're differentiating a scalar with respect to a vector, the result will be a vector. Use Einstein notation to find its $i$th component,$$\frac{d}{dw_i}(X_{jk}w_kX_{jl}w_l)=X_{jk}X_{jl}(\delta_{ik}w_l+w_k\delta_{il})=X_{ji}X_{jl}w_l+X_{jk}w_kX_{ji}=2(X^TXw)_i.$$So the derivative is $2w^TX^TX$.