Partial derivative of cross-entropy

623 Views Asked by At

I am trying to make sense of this question.

$$E(t,o)=-\sum_j t_j \log o_j$$

How did he derive the following?

$$\frac{\partial E} {\partial o_j} = \frac{-t_j}{o_j}$$

2

There are 2 best solutions below

0
On

I think "he" is assuming that the $t_i$, $o_i$ are independent variables; then they are each constant, any one with respect to any other; we have:

$\dfrac{\partial t_i}{\partial o_j} = 0, \forall i, \; j, \tag 1$

$\dfrac{\partial o_i}{\partial o_j} = 0, \forall i, \; j, \; i \ne j,\tag 2$

$\dfrac{\partial o_i}{\partial o_i} = 1, \forall i; \tag 3$

applying formulas (1)-(3) to

$E(t, o) = -\displaystyle \sum_j t_j \ln o_j \tag 4$

yields

$\dfrac{\partial E}{\partial o_i} = -\displaystyle \sum_j \dfrac{\partial (t_j \ln o_j)}{\partial o_i} = -\sum_j t_j \dfrac{\partial \ln o_j}{\partial o_i} = -\dfrac{t_i}{o_i}, \; \forall i; \tag 5$

3
On

You are missing a reformulation Christopher Bishop (1995) took for the cross-entropy, because the formulation

\begin{equation} E=-\sum_j t_j \log (y_j) \end{equation}

does not have a minimum value of zero.

However, his reformulation below for cross-entropy error has a minimum value of zero:

\begin{equation} E=-\sum_j t_j \log \left( \frac{y_j}{t_j} \right). \end{equation}

For the above equation, the partial derivative of the error $E$ w.r.t $y_j$ is simply

\begin{equation} \frac{\partial E}{\partial y_j}=-\frac{t_j}{y_j} . \end{equation}