I'm following Andrew Ng's Coursera course on machine learning. In week 5 he introduces backpropagation and the $\delta$ term which is meant to represent the "error" a neuron has.
He gives us this:
formula, where the .* represents the operation of elementwise multiplication (since the two arguments are both vectors).
He then gives a formula for the $g'(z^{(3)})$ term, which is $a^{(3)}.* (1-a^{(3)})$. He says that it's possible to prove this mathematically, but I'm unable to.
Note that $g(x)$ is the sigmoid function, $1/(1+e^{-x})$, and as such $g'(x) = \frac {e^{-x}}{(1+e^{-x})^2}$. Also note that functions in this course, when applied to vectors or matrices, are assumed to be applied elementwise.
Also note that $z^{(3)}$ = $\Theta^{(3)} a^{(3)}$. $\Theta^{(3)}$ is the matrix of weights for the 3rd layer, and $a^{(3)}$ is the activations of the units in the 3rd layer.
I hope that this is common terminology not just specific to this course, otherwise I guess answering this question is difficult because you don't know what the letters mean. I can't succintly explain the terms further...
Nonetheless, by slightly simplifying $g'(z^{(3)})$ I get: $$\frac {e^{ -\Theta^{(3)} a^{(3)} }} {1 + 2e^{ -\Theta^{(3)} a^{(3)} } + e^{ -2\Theta^{(3)} a^{(3)} } } $$
Is it feasible to get to $a^{(3)}.* (1-a^{(3)})$ from here?
