$\frac{ \partial{E} }{ \partial{s} } = \frac{ \partial{E} }{ \partial{z} } \frac{ \partial{z} }{ \partial{s} } = z - t$?

48 Views Asked by At

The following is a lecture slide from a machine learning class:

Cross Entropy


For classification tasks, target $t$ is either $0$ or $1$, so better to use $$E=-t\log(z)-(1-t)\log(1-z)$$ This can be justified mathematically, and works well in practice -- especially when negative examples vastly outweigh positive ones. It also makes the backprop computations simpler $$\begin{align}\frac{\partial E}{\partial z}&=\frac{z-t}{z(1-z)}\\ \text{if}\qquad z&=\frac{1}{1+e^{-s}}\underset{\color{white}{\int}}{,}\\ \frac{\partial E}{\partial s}&=\frac{\partial E}{\partial z}\frac{\partial z}{\partial s}=z-t\end{align}$$

By my calculations,

$$ \dfrac{ \partial{E} }{ \partial{s} } = \dfrac{ \partial{E} }{ \partial{z} } \dfrac{ \partial{z} }{ \partial{s} } = \left[ \dfrac{z - t}{z(1 - z)} \right] \left[ \dfrac{e^{-s} }{ (1 + e^{-s})^2 } \right] = \dfrac{e^{-s} (z - t) }{ z(1 - z)(1 + e^{-s})^2 } = \dfrac{e^{-s} (z - t) }{ z(1 - z)(1 + 2e^{-s} + e^{-2s}) } = \dfrac{e^{-s} (z - t) }{ z + 2ze^{-s} + ze^{-2s} - z^2 - 2z^2e^{-s} - z^2 e^{-2s} }$$

$$ z - t = \dfrac{1}{1 + e^{-s}} - t = \dfrac{1 - t(1 + e^{-s})}{1 + e^{-s}} $$

Is the slide incorrect, or is there something I'm missing?

I would greatly appreciate it if people could please take the time to clarify this.

1

There are 1 best solutions below

0
On BEST ANSWER

As you correctly calculate

$$\dfrac{ \partial{z} }{ \partial{s} } = \dfrac{e^{-s} }{ (1 + e^{-s})^2 }$$

and since

$$z = \dfrac{1}{1 + e^{-s}}$$

we also have that

$$\dfrac{1}{z}-1 = \dfrac{1-z}{z} = e^{-s}$$

and

$$z^2 = \dfrac{1}{(1 + e^{-s})^2}$$

which you can combine to

$$\dfrac{ \partial{z} }{ \partial{s} } = z(1-z) \; .$$