Understanding the impact of error on network output

26 Views Asked by At

I am currently going through Michael Nielsen's book. Everything has been crystal clear so far, up to now when I start looking at the fundamental equations of backpropagation.

See, he introduces $\delta_j^l$ as error in the $j^{th}$ neuron of the $l^{th}$ layer and defines it as a little addition $\Delta z_j^l$ to the neuron's weighted input $z_j^l$. Hence, the neuron output is now

$$\sigma(z_j^l + \Delta z_j^l)$$

He concludes the paragraph by saying that the change propagates through layer layers in the network, eventually causing the overall cost to change by an amount of

$$\frac{\partial C}{\partial z_j^l}\Delta z_j^l$$

See, this what I don't understand. Where does this quantity come from and how can I derive it from the original quantity $\sigma(z_j^l + \Delta z_j^l)$

1

There are 1 best solutions below

0
On

This follows simply from the Taylor expansion of the error. If $z^\ell_j$ is the weighted input to neuron $j$ in layer $\ell$, then we can view the cost as a function $C(z^\ell_j)$, where we hold all of the other variables in the network constant. In other words, view $C$ as a (complicated) single-value function of that one input $ C : \mathbb{R}\rightarrow\mathbb{R} $.

Then, the linear Taylor expansion of $C$ about $\tilde{z}^\ell_j$ is given by: $$ C(z^\ell_j) \approx C(\tilde{z}^\ell_j) + \frac{\partial C}{\partial z^\ell_j}(\tilde{z}^\ell_j)\left[ z^\ell_j - \tilde{z}^\ell_j \right] \tag{1} $$ This means that if the original value of input is given by $\tilde{z}^\ell_j$, then we can perturb it by $ \Delta {z}^\ell_j = {z}^\ell_j - \tilde{z}^\ell_j $. (This is the "Demon's perturbation"). This will cause the output to change by $ \Delta C = C({z}^\ell_j) - C(\tilde{z}^\ell_j) $. Using equation (1), we get that: $$ \Delta C = \frac{\partial C}{\partial z^\ell_j}(\tilde{z}^\ell_j)\;\Delta {z}^\ell_j $$ as expected. Note that this is accurate only for small changes $\Delta {z}^\ell_j$.