I am trying to understand the derivation of backpropagation for recurrent neural networks (RNNs) from this source: https://github.com/go2carter/nn-learn/blob/master/grad-deriv-tex/rnn-grad-deriv.pdf
I am stuck in understanding the below-given equation:
$\frac{\partial}{V_{ij}}(V_{lm}s_{m}) = \delta _{il} \delta _{jm}s_{m}$
I dont understand where these two Kronecker deltas come from.
Note: This is a cross-post from https://stats.stackexchange.com/questions/434609/taking-derivative-for-rnn-back-propogation
I am answering my own question:
From matrix calculus cookbook, I got this equality:
$\frac{\partial X_{kl}}{\partial X_{ij}} = \delta_{ik} \delta_{lj}$ (I could not find the proof for this)
Therefore:
$\frac{\partial V_{lm}}{\partial V_{ij}} = \delta_{il} \delta_{mj}$
Because all the indexes are free indexes (in einstein notation), one can change them so that $s_m$ can be incorporated (This statement needs validation from someone who knows einstein notation well). The result is given below.
$\frac{\partial V_{lm}s_m}{\partial V_{ij}} = \delta_{il} \delta_{jm} s_m$