I am studying the book of Goodfellow et al. on deep learning, and I am stuck at equation (10.19).
Said equation gives the gradient of the Loss with respect to the hidden layer.
Now, if the Loss gives a vector, and since the hidden layer weights is a vector too, this would result in a Jacobian matrix.
However, with the definition given, the result is a vector with the same dimensions as the hidden layer.
What am I missing?
Look up page 199 section 6.5.2 of the book.
This is a straight application of the chain rule of calculus.
V is the Jacobian Matrix in this case (eq 10.19) because o(t) = V*h(t) + c the partial derivatives are just the coefficients of the matrix (transposed).
There are no assumptions on h and o being same sized vector. h depend on how many recurrent cells you have in your layer and o depends on how many cells you have in the output layer.