Recursive chain rule

124 Views Asked by At

Given the following equations: $$ \begin{aligned} o_t&=\sigma(x_t, h_{t-1};W_o) \\ \tilde{c}_t&=\tanh(x_t, h_{t-1};W_g) \\ f_t&=\sigma(x_t, h_{t-1};W_f) \\ i_t&=\sigma(x_t, h_{t-1};W_i) \\ c_t&=f_t\odot{c_{t-1}}+i_t\odot{\tilde{c}_t} \\ &=\sigma(x_t, h_{t-1};W_f)\odot{c_{t-1}}+\sigma(x_t, h_{t-1};W_i)\odot{\tanh(x_t, h_{t-1};W_g)} \\ h_t&=o_t\odot\tanh(c_t) \\ \end{aligned} $$ Lower case variables are vectors and $W$'s are matrices. $x_t$ is a new input at each time. The subscript $t$ denotes time from $t=1,\dots N$, where $h_0=c_0$ are zeros vectors. As can be seen these are recursive equations from which $h_t$ comes from a function of $c_t$ which comes from a function of $c_{t-1}$.

And I'm trying to derive: $$\frac{\partial c_t}{\partial c_{t-1}}=\frac{\partial}{\partial c_{t-1}}(f_t\odot{c_{t-1}}+i_t\odot{\tilde{c}_t}).$$

So my question is given that $f_t$, $i_t$ and $g_t$ are functions of $h_{t-1}$ which is a function of $c_{t-1}$ does that matter? Is this derivative just equal to $f_t$? Or do the other terms need to be accounted for such that the derivation is something like: $$\frac{\partial c_t}{\partial c_{t-1}}=\frac{\partial{f_t}}{\partial{c_{t-1}}}\odot{c_{t-1}}+f_t\odot\frac{\partial{c_{t-1}}}{\partial{c_{t-1}}}+\frac{\partial{i_t}}{\partial{c_{t-1}}}\odot{\tilde{c}_t}+i_t\odot\frac{\partial{\tilde{c}_t}}{\partial{c_{t-1}}}?$$

Any help would be greatly appreciated.