The following is a lecture slide from a machine learning class:
Cross Entropy
For classification tasks, target $t$ is either $0$ or $1$, so better to use $$E=-t\log(z)-(1-t)\log(1-z)$$ This can be justified mathematically, and works well in practice -- especially when negative examples vastly outweigh positive ones. It also makes the backprop computations simpler $$\begin{align}\frac{\partial E}{\partial z}&=\frac{z-t}{z(1-z)}\\ \text{if}\qquad z&=\frac{1}{1+e^{-s}}\underset{\color{white}{\int}}{,}\\ \frac{\partial E}{\partial s}&=\frac{\partial E}{\partial z}\frac{\partial z}{\partial s}=z-t\end{align}$$
By my calculations,
$$ \dfrac{ \partial{E} }{ \partial{s} } = \dfrac{ \partial{E} }{ \partial{z} } \dfrac{ \partial{z} }{ \partial{s} } = \left[ \dfrac{z - t}{z(1 - z)} \right] \left[ \dfrac{e^{-s} }{ (1 + e^{-s})^2 } \right] = \dfrac{e^{-s} (z - t) }{ z(1 - z)(1 + e^{-s})^2 } = \dfrac{e^{-s} (z - t) }{ z(1 - z)(1 + 2e^{-s} + e^{-2s}) } = \dfrac{e^{-s} (z - t) }{ z + 2ze^{-s} + ze^{-2s} - z^2 - 2z^2e^{-s} - z^2 e^{-2s} }$$
$$ z - t = \dfrac{1}{1 + e^{-s}} - t = \dfrac{1 - t(1 + e^{-s})}{1 + e^{-s}} $$
Is the slide incorrect, or is there something I'm missing?
I would greatly appreciate it if people could please take the time to clarify this.
As you correctly calculate
$$\dfrac{ \partial{z} }{ \partial{s} } = \dfrac{e^{-s} }{ (1 + e^{-s})^2 }$$
and since
$$z = \dfrac{1}{1 + e^{-s}}$$
we also have that
$$\dfrac{1}{z}-1 = \dfrac{1-z}{z} = e^{-s}$$
and
$$z^2 = \dfrac{1}{(1 + e^{-s})^2}$$
which you can combine to
$$\dfrac{ \partial{z} }{ \partial{s} } = z(1-z) \; .$$