What is the derivation of the derivative of softmax regression (or multinomial logistic regression)?

5.8k Views Asked by Bumbble Comm At 04 Apr 2026 - 5:21

Consider the training cost for softmax regression (I will use the term multinomial logistic regression):

$$ J( \theta ) = - \sum^m_{i=1} \sum^K_{k=1} 1 \{ y^{(i)} = k \} \log p(y^{(i)} = k \mid x^{(i)} ; \theta) $$

according to the UFLDL tutorial the derivative of the above function is:

$$ \bigtriangledown_{ \theta^{(k)} }J( \theta ) = -\sum^{m}_{i=1} [x^{(i)} (1 \{ y^{(i)} = k \} - p(y^{(i)} = k \mid x^{(i)} ; \theta) ) ] $$

however, they didn't include the derivation. Does someone know what the derivation is?

I have tried taking the derivative of it but even my initial steps seems to disagree with the final form they have.

So I first took the gradient $\bigtriangledown_{ \theta^{(k)} }J( \theta )$ as they suggested:

$$ \bigtriangledown_{ \theta^{(k)} } J( \theta ) = - \bigtriangledown_{ \theta^{(k)} } \sum^m_{i=1} \sum^K_{k=1} 1 \{ y^{(i)} = k \} \log p(y^{(i)} = k \mid x^{(i)} ; \theta) $$

but since we are taking the gradient with respect to $\theta^{(k)}$, only the term that matches this specific k will be non-zero when we taking derivatives. Hence:

$$ \bigtriangledown_{ \theta^{(k)} } J( \theta ) = - \sum^m_{i=1} \bigtriangledown_{ \theta^{(k)} } \log p(y^{(i)} = k \mid x^{(i)} ; \theta) $$

then if we proceed we get:

$$ - \sum^m_{i=1} \frac{1}{p(y^{(i)} = k \mid x^{(i)} ; \theta)} \bigtriangledown_{ \theta^{(k)} } p(y^{(i)} = k \mid x^{(i)} ; \theta) $$

however, at this point the equation looks so different from what the UDFL tutorial has plus the indicator function disappeared completely, that it makes me suspect that I probably made a mistake somewhere. On top of that it seems that the final derivative has difference, but I don't see any differences/subtractions on my derivation. I suspect a difference might come in when expressing the Quotient rule but the indicator function disappearing still worries me. Any ideas?

Original Q&A

What is the derivation of the derivative of softmax regression (or multinomial logistic regression)?

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions