There is a great explanation of the calculation of backpropagation gradient in the CS231n class. Please find the question here.
twolffpiggott's answer improved my general understanding. However, I've got stuck at one of the derivations. My first question is about this formula:
$$ \frac{\partial\mathcal{L}_i}{\partial \boldsymbol{w_j}} = \sum_{k=1}^{K} \frac{\partial\mathcal{L}_i}{\partial f_k} \times \frac{\partial f_k}{\partial \boldsymbol{w_j}} .$$
How do you convert into the second line, which is: $$ \frac{\partial\mathcal{L}_i}{\partial f_j} \times \frac{\partial f_j}{\partial \boldsymbol{w_j}}$$ In other words, how did $\sum$ turned into the second line?
The second question is about $k$. May I kindly ask, what is $k$ in those lines? Is it different from $j$?
Thanks in advance for your time.
As written in the answer you mention the sum disappears because among all the $f_k$ only $f_j$ depends on $w_j$. Therefore, for $k \neq j$, $\dfrac{\partial f_k}{\partial w_j}=0$.
$k$ and $j$ are indices, they can be considered as "mute", because they do not represent anything. You can write the same equations with $k$ replaced by $j$ and $j$ replaced by $k$. However we usually like using $k$ for summation and $j$ as a particular index, but that is only a choice.