Why we sum up the derivatives of the loss w.r.t. Weights at each time step in RNN back-propagation?

283 Views Asked by At

I am reading a paper explaining the derivations of the back-propagation equations in RNNs. There I read 'Note that the Weight Matrix remains the same across all time sequence so we can differentiate to it at each time step and sum all together.'

enter image description here

My question is why this statement is correct. What is its mathematical derivation?

Your advice will be appreciated.