Softmax Regression Derivative

801 Views Asked by At

This website, http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression, claims the derivative of a multinomial regression: $$ J(\theta) = -\frac{1}{m}\sum_{i=1}^m \sum_{j=1}^k 1\{y^i =j\} log\frac{e^{\theta^T_j x^i}}{\sum_{l=1}^k e^{\theta^{T}_lx^i}} $$

is $$\nabla_{\theta_j} J = - \frac{1}{m} \sum_{i=1}^m [x^{(i)}1\{y_i = j\}-\frac{e^{\theta^{T}_jx^i}}{\sum e^{\theta^{T}_lx^i}}] $$

I can't get this to work out.

I get this:

$$-\frac{1}{m}\sum_{i=1}^m 1\{y^i=j\}x^i - 1\{y^i = j\}\frac{e^{\theta^{T}_jx^i}}{\sum e^{\theta^{T}_lx^i}}x^i $$.

What am I doing wrong?

1

There are 1 best solutions below

0
On

In the inner sum of $J(\theta)$, split out the log of the ratio into a difference of logs: $$\begin{align} \sum_{j=1}^k 1\{y^i =j\} \log\frac{e^{\theta^T_j x^i}}{\sum_{l=1}^k e^{\theta^{T}_lx^i}} &=A-B, \end{align} $$ where $$ A=\sum_{j=1}^k 1\{y^i =j\} \log(e^{\theta^T_j x^i})=\sum_{j=1}^k 1\{y^i =j\}\theta^T_j x^i $$ and $$ B=\sum_{j=1}^k1\{y^i =j\}\log\left({\sum_{l=1}^k e^{\theta^{T}_lx^i}}\right) =\log\left({\sum_{l=1}^k e^{\theta^{T}_lx^i}}\right)\sum_{j=1}^k1\{y^i =j\} = \log\left({\sum_{l=1}^k e^{\theta^{T}_lx^i}}\right), $$ because $\log\left({\sum_{l=1}^k e^{\theta^{T}_lx^i}}\right)$ is free of $j$ and can be pulled out the summation, leaving $\sum_{j=1}^k 1\{y^i =j\}$, which equals 1, i.e. $y^i$ has to be one of the possible $j$ values.

Differentiating $A$ and $B$ separately wrt $\theta_j$, and then collecting the results, will get you the gradient reported on the website, which is $$ \nabla_{\theta_j} J = - \frac{1}{m} \sum_{i=1}^m \left[x^{(i)}\left(1\{y_i = j\}-\frac{e^{\theta^{T}_jx^i}}{\sum e^{\theta^{T}_lx^i}}\right)\right]. $$ (note that you left out a set of parentheses in what you typed.)