please help me to double check this derivative

81 Views Asked by At

Can you please help me to check whether this is correct?

$\hat{y} = \operatorname{softmax}(hU + b_2)$, $J = \operatorname{CrossEntropy}(y, \hat{y})$.

where $\hat{y} \in \Bbb R^{1x5}, y \in\Bbb R^{1x5}, h\in \Bbb R^{1x 30}, U \in \Bbb R^{30 x5}, b_2 \in \Bbb R^{1x5}$. And $y$ is a one hot vector (meaning only 1 entry has probability of 1, while the other entries are 0.

I want to compute the gradient with respect of L w.r.t $U$ and $b_2$.


Attempt:

$\frac{dJ}{dU} = (\operatorname{softmax}(hU+b_2) - y)h$

$\frac{dJ}{db_2} =(\operatorname{softmax}(hU+b_2) - y) $

But it looks like $dJ/dU$ should have a dimension of $30 \times 5$, this doesn't work out. Am I missing anything?

For the derivative of cross-entropy with softmax, I follow the derivation here: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/