How to derive the gradient of RNN and what is the definition of Loss function in this graph?

540 Views Asked by At

I am reading Deep Learning and I am not able to follow the gradient derivation of RNN.

The graph of RNN is like this: Graph

The updating equations are as follow: Equations

The loss function is: Loss

And the derivation of gradient is like this: Derivation

I am confused by equation 10.18. What is the function of loss here and why this holds: gradient

2

There are 2 best solutions below

0
On BEST ANSWER

Inspired by this :Softmax and the negative log-likelihood

I write my own Softmax function as: $$\widehat{y}_i^{(t)}=\frac{e^{o_i^{(t)}}}{\sum_j{e^{o_j^{(t)}}}}$$ and it's derivative with respect to $o_i^{(t)}$:

$$\frac{\partial{\widehat{y}_i^{(t)}}} {\partial{o_i^{(t)}}} =\widehat{y}_i^{(t)}(1-\widehat{y}_i^{(t)})$$

My negative log-likelihood is written as: $$L^{(t)}=-\sum_{i}\log{\widehat{y}_i^{(t)}}$$ and it's derivative with respect to $\widehat{y}_i^{(t)}$:

$$\frac{\partial{L^{(t)}}}{\partial{\widehat{y}_i^{(t)}}}=-\frac{1}{\widehat{y}_i^{(t)}}$$

Combining the equations above, I get:

$$\frac{\partial{L^{(t)}}}{\partial{o_i^{(t)}}}=\frac{\partial{L^{(t)}}}{\partial{\widehat{y}_i^{(t)}}} \frac{\partial{\widehat{y}_i^{(t)}}}{\partial{o_i^{(t)}}}=-\frac{1}{\widehat{y}_i^{(t)}}[\widehat{y}_i^{(t)}(1-\widehat{y}_i^{(t)})]=\widehat{y}_i^{(t)}-1 $$

I have two questions now:

  1. Is my derivation above correct?

  2. If it is, why is there a small difference between my result and the book Deep Learning : $$\textbf{1}_{i=y^{(t)}}$$ what dose $\textbf{1}_{i=y^{(t)}}$ mean and can it be just a simple 1 ?

1
On