Matrix calculus for RNN equations

84 Views Asked by At

In the deep learning book we have the standard RNN with these equations. It calculates various derivatives, including one for W.

I understand that:

  • $1 - {(h^{(t)})}^2 $ is coming from the derivative of $tanh$
  • $h^{(t-1)}$ is coming from the chain rule

What I don't understand:

  • where is the $diag$ coming from?
  • where is the transpose for $h^{(t-1)}$ is coming from?
  • why are they on the particular order they are in (apart from that this way the dimensions match). It feels like the gradient of L just got somehow between the parts of the derivate of $h^{(t)}$

Thank you