Explanation for chain rule for softmax

106 Views Asked by Bumbble Comm At 04 Apr 2026 - 2:14

I'm particularly referring to Lecture 2 of Stanford's CS224n: Natural Language Processing with Deep Learning.

Professor Chris Manning writes this equation on the board (related to softmax), and says its an application of the chain rule.

$$ \frac{\partial}{\partial v_c} \log (\sum_{w=1}^v \exp (u_w^Tv_c)) = \frac{1}{\sum_{w=1}^v \exp (u_w^Tv_c)}*\frac{\partial}{\partial v_c}\sum_{x=1}^v \exp (u_x^Tv_c)$$

I understand this, as applying the chain rule in this case would be simply $\frac{f'(x)}{f(x)} $,since it's a $\log$ function, then multiplying this with the derivative of the inner function.

My question is, How is the numerator of the left term 1?

$$ \frac{1}{\sum_{w=1}^v \exp (u_w^Tv_c)}$$ I.e. how does differentiating

$$\sum_{w=1}^v \exp (u_w^Tv_c)$$

render the numerator as 1?

Thank you! :)

Original Q&A

There are 1 best solutions below

Bumbble Comm On 09 Jul 2017 - 11:50 BEST ANSWER

I believe that he hasn't yet taken the ''second'' chain rule derivative (i.e. the derivative of $\sum_{w=1}^v \exp (u_w^Tv_c))$ at this step. As you mention, the ''outer'' derivative corresponding to the $\log$ would give: \begin{align*} \dfrac{\partial}{\partial v_c} \log(f(v_c)) = \dfrac{\dfrac{\partial}{\partial v_c} f(v_c)}{f(v_c)} = \dfrac{1}{f(v_c)}\cdot \dfrac{\partial}{\partial v_c} f(v_c). \end{align*} Now, replace $f(v_c)$ with $f(v_c)=\sum_{w=1}^{v} \exp(u_w^Tv_c)$ in the above expression and you'll arrive at the step you've described.

Explanation for chain rule for softmax

There are 1 best solutions below

Related Questions in PARTIAL-DERIVATIVE

Related Questions in CHAIN-RULE

Trending Questions

Popular # Hahtags

Popular Questions