Understanding the equations of Backpropagation

132 Views Asked by At

I was going through the equations of Backpropagation in Andrew Ng's Deep Learning course and I got these set of equations for a two layer Neural Network:

$dZ^{[2]} = A^{[2]} - y$

$dW^{[2]} = 1 / m \space\space dZ^{[2]}\space A^{[1]T}$

$dZ^{[1]} = W^{[2]T} dZ^{[2]} g^{[1]'}(Z^{[1]})$

$dW^{[1]} = 1/m \space\space dZ^{[1]} \space X^{[T]}$

Where

$A^{[i]}$ is the activation values for the $i^{th}$ layer.

$y$ is the target value.

$Z^{[i]}$ is the input for the $i^{th}$ layer.

$W^{[i]}$ is the weight between the $i^{th}$ layer and the $(i-1)^{th}$ layer.

$g^{[i]}()$ is the activation function for the $i^{th}$ layer.

$X$ is the input to the neural network.

I've intentionally ignored the bias terms to make it simpler.

I do understand that the first equation represents the error in the last layer, the second equation is derived from ${\space}{\partial}E^{[2]}/{\partial}W^{[2]}\,$ when $E = - (1/m {\space}[y \log a^{[2]} + (1 - y) \log(1 - a^{[2]})])$ given the activation function is a sigmoid function.

I would like to know the formal derivation for the third equation.