I'm asked to differentiate
$$\dfrac{C}{2} \sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2},$$
according to $w^{j'}_{k}$ which is the $k$th weight of the vector of weight of $j'$.
- It seems that $\| W^{j'} \|^{2}_{2}$ stands for the the $L2$ norm squared. This seems to indicate that we have: $$\| W^{j'} \|^{2}_{2} = w^{2}_1 + w^{2}_2 + ... + w^{2}_k$$
- Then to my understanding $\sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2}$ seems to indicate that we have: $$\sum^{m}_{j' = 1} \| W^{j'} \|^{2}_{2} = (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=1} + (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=2} + ... + (w^{2}_1 + w^{2}_2 + ... + w^{2}_k)_{j'=m}$$
I have absolutely no clue how to derivate this formula. Do I use the sum rule to get this? $$\dfrac{\partial }{\partial w^{j'}_{k}} =\dfrac{C}{2} (2w_k)_{j'=1} + (2w_k)_{j'=2} + ... + (2w_k)_{j'=m}$$
Edit: This is from my machine learning course. This formula represents the regularization term we add to the loss function of an SVM (Support vector machines) in order to minimize the objective function.
Here $W$ is a vector containing scalars $w$.
You have $m$ vectors $W^1,\ldots,W^m$, each of length $n$. So all in all you have $mn$ variables $w^j_k$ with $1\leq j\leq m$ and $1\leq k\leq n$. By definition of the $L^2$-norm you have $$\frac{C}{2}\sum_{j=1}^m||W^j||_2^2=\frac{C}{2}\sum_{j=1}^m\sum_{k=1}^n(w^j_k)^2,$$ which is simply a sum of squares. So for any particular variable $w^j_k$ you have $$\frac{\partial}{\partial w^j_k}(w^i_l)^2=\begin{cases}2w^j_k&\text{ if $i=j$ and $l=k$}\\0&\text{ otherwise }\end{cases}$$ Then by linearity of derivatives it quickly follows that $$\frac{\partial}{\partial w^j_k}\frac{C}{2}\sum_{i=1}^m||W^i||_2^2=\frac{C}{2}\sum_{i=1}^m\sum_{l=1}^n\frac{\partial}{\partial w^j_k}(w^i_l)^2=Cw^j_k.$$