Derivative of output gate in LSTM

162 Views Asked by At

I'm going through "Sequence Models" course by deeplearning.ai on coursera. But I've get confused on first hometask. While forward pass in lstm, we have: **enter image description here**

But then it continues, saying that in backpropagation we have the following derivative for output gate:

**enter image description here**

Prefix $d$ means the "derivative of loss function $L$ with respect to". $t$ - at t-th time(iteration). Suppose we have some loss function $L$ let's do a math: $$ d\Gamma_o^{<t>}=\frac{\delta L}{\delta \Gamma_{o}^{<t>}} = \frac{\delta L}{\delta a^{<t>}} \frac{\delta a^{<t>}}{\delta \Gamma_{o}^{<t>}}= da^{<t>}tanh(c^{<t>}) $$ As you see, there is no $\Gamma_o^{<t>}(1-\Gamma_o^{<t>})$. What am I doing wrong?

UPDATE: But you know, this expression looks like a derivative of sigmoid function used in equation $(5)$. Because: $$ \frac{\delta \sigma(x)}{\delta x} = \sigma(x)(1-\sigma(x)) $$ So, maybe it is a derivative of $L$ with respect to another variable(let's say to $W_a$), but not to $\Gamma_o^{<t>}$. In that case we have: $$ \frac{\delta L}{\delta \Gamma_{o}^{<t>}}=\frac{\delta L}{\delta a^{<t>}}\frac{\delta a^{<t>}}{\delta \Gamma_o^{<t>}}\frac{\delta \Gamma_o^{<t>}}{\delta W_a} $$ And in the right side of this equation the last multiplier is equal to $\Gamma_o^{<t>}(1-\Gamma_o^{<t>})$. But this is not the case, because they write $d\Gamma_o^{<t>}$ Any ideas?

1

There are 1 best solutions below

1
On

I believe the (*) operation is a Hadamard Product.

In Hadamard operation, say we have F = A (*) B

d(F)/d(A) would be d(A) (*) B