Log-Likelihook and Softmax

504 Views Asked by Bumbble Comm At 25 Mar 2026 - 7:25

I need your help in understanding the following problem:

Given equations (80) and (78), one need to derive equation (81) using the chain rule from calculus, however from where does the $y_j$ comes from? since in either equations (80) and (78), this term does not appear there. These equations are stated in Neural Networks and Deep Learning.

Please advise.

Thanks in advance.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 05 Jun 2017 - 1:01 BEST ANSWER

Let the $j$th activation output be $$a_j=\frac{\exp(z_j)}{S},\;\;S=\sum_{t\in\mathcal{O}} \exp(z_t)$$ for outputs $\mathcal{O}$. The input is given by $$ z_k = \sum_{i\in\mathcal{I}} w_{ki}\tilde{a}_i + b_k $$ for inputs $\mathcal{I}$. Then, the log-likelihood cost $$ C = -\ln(a_y)= -\left[ \sum_{k\in\mathcal{I}} w_{yk}\tilde{a}_k + b_y \right] + \ln(S) $$ with derivative \begin{align} \frac{\partial C}{\partial b_j} &= -\delta_{yj} + \frac{1}{S}\frac{\partial S}{\partial b_j}\\ &= -{y_j} + \frac{1}{S}\sum_{t\in\mathcal{O}}\exp(z_t) \frac{\partial z_t}{\partial b_j} \\ &= -{y_j} + \frac{\exp(z_j)}{S} \\ &= a_j - y_j \end{align} where $\delta_{yj}=y_j$ is the Kronecker delta.

The author remarks in a (cryptic) sidenote thet $y_j$ is the vector of zeros except at the $j$th position.

Log-Likelihook and Softmax

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Related Questions in LOG-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions