Gradient calculation via backprop in RNN

935 Views Asked by At

I'm trying to understand the gradient calculation via backprop in RNN but not able to follow at one step. The question is from RNN. In particular, I'm not able to follow from \begin{equation} \frac{\partial E_3}{\partial W}= \frac{\partial E_3}{\partial \hat{y}_3}\frac{\partial \hat{y}_3}{\partial s_3}\frac{\partial s_3}{\partial W} \end{equation} to: \begin{equation} \frac{\partial E_3}{\partial W}= \sum_{k=0}^{k=3} \frac{\partial E_3}{\partial \hat{y}_3}\frac{\partial \hat{y}_3}{\partial s_3}\frac{\partial s_3}{\partial s_k}\frac{\partial s_k}{\partial W} \end{equation} I know chain rule but seems not that good.
Can someone help me with this? Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

In the backpropagation, the $\sum$ comes from the rule of the total derivative. What you need to work on is $$\text{if}\quad g(t) = f(x(t),y(t)), \quad \text{then } \quad g'(t) = \frac{\partial f}{\partial x} x'(t)+\frac{\partial f}{\partial y} y'(t)$$

2
On

Observe \begin{align} \frac{\partial s_3}{\partial s_k} = \begin{cases} 1 & \text{ if }\ \ k=3\\ 0& \text{ otherwise} \end{cases} \end{align} Hence it follows \begin{align} \sum^3_{k=0}\frac{\partial s_3}{\partial s_k}=1 \end{align} which means \begin{align} \frac{\partial E_3}{\partial W}\times 1=&\ \frac{\partial E_3}{\partial \hat{y}_3}\frac{\partial \hat{y}_3}{\partial s_3}\frac{\partial s_3}{\partial W}\times \sum^3_{k=0}\frac{\partial s_3}{\partial s_k} = \frac{\partial E_3}{\partial \hat{y}_3}\frac{\partial \hat{y}_3}{\partial s_3}\sum^3_{k=0}\frac{\partial s_3}{\partial s_k}\frac{\partial s_3}{\partial W}\\ =&\ \frac{\partial E_3}{\partial \hat{y}_3}\frac{\partial \hat{y}_3}{\partial s_3} \sum^3_{k=0} \frac{\partial s_3}{\partial s_k}\frac{\partial s_k}{\partial W}. \end{align} The last equality is actually trivial.