Log-Likelihook and Softmax

504 Views Asked by At

I need your help in understanding the following problem:

Given equations (80) and (78), one need to derive equation (81) using the chain rule from calculus, however from where does the $y_j$ comes from? since in either equations (80) and (78), this term does not appear there. These equations are stated in Neural Networks and Deep Learning.

Please advise.

Thanks in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

Let the $j$th activation output be $$a_j=\frac{\exp(z_j)}{S},\;\;S=\sum_{t\in\mathcal{O}} \exp(z_t)$$ for outputs $\mathcal{O}$. The input is given by $$ z_k = \sum_{i\in\mathcal{I}} w_{ki}\tilde{a}_i + b_k $$ for inputs $\mathcal{I}$. Then, the log-likelihood cost $$ C = -\ln(a_y)= -\left[ \sum_{k\in\mathcal{I}} w_{yk}\tilde{a}_k + b_y \right] + \ln(S) $$ with derivative \begin{align} \frac{\partial C}{\partial b_j} &= -\delta_{yj} + \frac{1}{S}\frac{\partial S}{\partial b_j}\\ &= -{y_j} + \frac{1}{S}\sum_{t\in\mathcal{O}}\exp(z_t) \frac{\partial z_t}{\partial b_j} \\ &= -{y_j} + \frac{\exp(z_j)}{S} \\ &= a_j - y_j \end{align} where $\delta_{yj}=y_j$ is the Kronecker delta.

The author remarks in a (cryptic) sidenote thet $y_j$ is the vector of zeros except at the $j$th position.