I'm asked to differentiate this $\dfrac{C}{2} \sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2}$ but I barely understand the notation.

136 Views Asked by At

I'm asked to differentiate

$$\dfrac{C}{2} \sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2},$$

according to $w^{j'}_{k}$ which is the $k$th weight of the vector of weight of $j'$.

  • It seems that $\| W^{j'} \|^{2}_{2}$ stands for the the $L2$ norm squared. This seems to indicate that we have: $$\| W^{j'} \|^{2}_{2} = w^{2}_1 + w^{2}_2 + ... + w^{2}_k$$
  • Then to my understanding $\sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2}$ seems to indicate that we have: $$\sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2} = (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=1} + (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=2} + ... + (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=m}$$

I have absolutely no clue how to derivate this formula. Do I use the sum rule to get this? $$\dfrac{\partial }{\partial w^{j'}_{k}} =\dfrac{C}{2} (2w_k)_{j'=1} + (2w_k)_{j'=2} + ... + (2w_k)_{j'=m}$$

Edit: This is from my machine learning course. This formula represents the regularization term we add to the loss function of an SVM (Support vector machines) in order to minimize the objective function.

Here $W$ is a vector containing scalars $w$.

2

There are 2 best solutions below

6
On BEST ANSWER

You have $m$ vectors $W^1,\ldots,W^m$, each of length $n$. So all in all you have $mn$ variables $w^j_k$ with $1\leq j\leq m$ and $1\leq k\leq n$. By definition of the $L^2$-norm you have $$\frac{C}{2}\sum_{j=1}^m||W^j||_2^2=\frac{C}{2}\sum_{j=1}^m\sum_{k=1}^n(w^j_k)^2,$$ which is simply a sum of squares. So for any particular variable $w^j_k$ you have $$\frac{\partial}{\partial w^j_k}(w^i_l)^2=\begin{cases}2w^j_k&\text{ if $i=j$ and $l=k$}\\0&\text{ otherwise }\end{cases}$$ Then by linearity of derivatives it quickly follows that $$\frac{\partial}{\partial w^j_k}\frac{C}{2}\sum_{i=1}^m||W^i||_2^2=\frac{C}{2}\sum_{i=1}^m\sum_{l=1}^n\frac{\partial}{\partial w^j_k}(w^i_l)^2=Cw^j_k.$$

0
On

Use the standard basis vectors $\{e_k\}$ to extract the columns of the matrix, i.e. $$W^j = We_j$$ Let's also use a colon to denote the trace/Frobenius product, i.e. $$A:B \;=\; \sum_{i=1}^m\sum_{j=1}^nA_{ij}B_{ij} \;=\; {\rm Tr}(A^TB)$$ Then the objective function can be expressed in a form which is friendlier to matrix algebra. $$\eqalign{ \phi &= \dfrac{C}{2} \sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2} \\ &= \tfrac 12C\left(\sum_{j=1}^m We_j:We_j\right) \\ &= \tfrac 12C\left(\sum_{j=1}^m e_je_j^T:W^TW\right) \\ &= \tfrac 12C\Big(I:W^TW\Big) \\ &= \tfrac 12C\,W:W \\ }$$ Now calculating the gradient is easy. $$\eqalign{ d\phi &= \tfrac 12C\,(W:dW+dW:W) \\ &= CW:dW \\ \frac{\partial\phi}{\partial W} &= CW \\ }$$ If you wish to extract individual components, simply pre/post multiply by the basis vectors. $$\eqalign{ e_i^T\left(\frac{\partial\phi}{\partial W}\right)e_j &= Ce_i^TWe_j \\ \frac{\partial\phi}{\partial W_{ij}} &= CW_{ij} \\ }$$