Derivatives in Recurrent Neural Networks

220 Views Asked by Bumbble Comm At 27 Mar 2026 - 10:16

I am studying the book of Goodfellow et al. on deep learning, and I am stuck at equation (10.19).

Said equation gives the gradient of the Loss with respect to the hidden layer.
Now, if the Loss gives a vector, and since the hidden layer weights is a vector too, this would result in a Jacobian matrix.
However, with the definition given, the result is a vector with the same dimensions as the hidden layer. What am I missing?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 22 Oct 2021 - 3:36

Look up page 199 section 6.5.2 of the book.

This is a straight application of the chain rule of calculus.

V is the Jacobian Matrix in this case (eq 10.19) because o(t) = V*h(t) + c the partial derivatives are just the coefficients of the matrix (transposed).

There are no assumptions on h and o being same sized vector. h depend on how many recurrent cells you have in your layer and o depends on how many cells you have in the output layer.

Derivatives in Recurrent Neural Networks

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions