Derivative of output gate in LSTM

162 Views Asked by Bumbble Comm At 25 Mar 2026 - 8:41

I'm going through "Sequence Models" course by deeplearning.ai on coursera. But I've get confused on first hometask. While forward pass in lstm, we have:

But then it continues, saying that in backpropagation we have the following derivative for output gate:

Prefix $d$ means the "derivative of loss function $L$ with respect to". $t$ - at t-th time(iteration). Suppose we have some loss function $L$ let's do a math: $$ d\Gamma_o^{<t>}=\frac{\delta L}{\delta \Gamma_{o}^{<t>}} = \frac{\delta L}{\delta a^{<t>}} \frac{\delta a^{<t>}}{\delta \Gamma_{o}^{<t>}}= da^{<t>}tanh(c^{<t>}) $$ As you see, there is no $\Gamma_o^{<t>}(1-\Gamma_o^{<t>})$. What am I doing wrong?

UPDATE: But you know, this expression looks like a derivative of sigmoid function used in equation $(5)$. Because: $$ \frac{\delta \sigma(x)}{\delta x} = \sigma(x)(1-\sigma(x)) $$ So, maybe it is a derivative of $L$ with respect to another variable(let's say to $W_a$), but not to $\Gamma_o^{<t>}$. In that case we have: $$ \frac{\delta L}{\delta \Gamma_{o}^{<t>}}=\frac{\delta L}{\delta a^{<t>}}\frac{\delta a^{<t>}}{\delta \Gamma_o^{<t>}}\frac{\delta \Gamma_o^{<t>}}{\delta W_a} $$ And in the right side of this equation the last multiplier is equal to $\Gamma_o^{<t>}(1-\Gamma_o^{<t>})$. But this is not the case, because they write $d\Gamma_o^{<t>}$ Any ideas?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 09 Mar 2022 - 10:34

I believe the (*) operation is a Hadamard Product.

In Hadamard operation, say we have F = A (*) B

d(F)/d(A) would be d(A) (*) B

Derivative of output gate in LSTM

There are 1 best solutions below

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions