I am reading Deep Learning and I am not able to follow the gradient derivation of RNN.
The graph of RNN is like this:

The updating equations are as follow:

And the derivation of gradient is like this:

I am confused by equation 10.18.
What is the function of loss here and why this holds:


Inspired by this :Softmax and the negative log-likelihood
I write my own Softmax function as: $$\widehat{y}_i^{(t)}=\frac{e^{o_i^{(t)}}}{\sum_j{e^{o_j^{(t)}}}}$$ and it's derivative with respect to $o_i^{(t)}$:
$$\frac{\partial{\widehat{y}_i^{(t)}}} {\partial{o_i^{(t)}}} =\widehat{y}_i^{(t)}(1-\widehat{y}_i^{(t)})$$
My negative log-likelihood is written as: $$L^{(t)}=-\sum_{i}\log{\widehat{y}_i^{(t)}}$$ and it's derivative with respect to $\widehat{y}_i^{(t)}$:
$$\frac{\partial{L^{(t)}}}{\partial{\widehat{y}_i^{(t)}}}=-\frac{1}{\widehat{y}_i^{(t)}}$$
Combining the equations above, I get:
$$\frac{\partial{L^{(t)}}}{\partial{o_i^{(t)}}}=\frac{\partial{L^{(t)}}}{\partial{\widehat{y}_i^{(t)}}} \frac{\partial{\widehat{y}_i^{(t)}}}{\partial{o_i^{(t)}}}=-\frac{1}{\widehat{y}_i^{(t)}}[\widehat{y}_i^{(t)}(1-\widehat{y}_i^{(t)})]=\widehat{y}_i^{(t)}-1 $$
I have two questions now:
Is my derivation above correct?
If it is, why is there a small difference between my result and the book Deep Learning : $$\textbf{1}_{i=y^{(t)}}$$ what dose $\textbf{1}_{i=y^{(t)}}$ mean and can it be just a simple 1 ?