Calculating derivatives for backpropagation in 2-layer Neural Network

228 Views Asked by At

I am trying to calculate derivatives for backpropagation in 2-layer Neural Network using 3Blue1Brown video here to bulid Neural Network from here with sigmoid activation function.

My calculations are different from the second link and I don't know why. I obtain derivative for the last layer from this formula:

$\frac{Co}{dw_L} = \frac{z_L}{dw_L} \frac{a_L}{dz_L} \frac{Co}{da_L}$

and derivative for the first (second to last layer) from this formula:

$\frac{Co}{dw_{L-1}} = \frac{z_{L-1}}{dw_{L-1}} \frac{a_{L-1}}{dz_{L-1}} \frac{z_L}{da_{L-1}} \frac{a_L}{dz_L} \frac{Co}{da_L}$

where:

${Co}= (a_L - y)^2$

${z_L} = w_L*a_{L-1}$

${a_L} = \sigma(z_L)$

Summing it all up:

$\frac{Co}{dw_L} = \frac{z_L}{dw_L} \frac{a_L}{dz_L} \frac{Co}{da_L} = a_{L-1} * \sigma'(z_L) * 2*(a_L - y)$

$\frac{Co}{dw_{L-1}} = \frac{z_{L-1}}{dw_{L-1}} \frac{a_{L-1}}{dz_{L-1}} \frac{z_L}{da_{L-1}} \frac{a_L}{dz_L} \frac{Co}{da_L} = a_{L-2}* \sigma'(z_{L-1}) * \sigma'(w_L * a_{L-1}) * w_L$

  1. How did he get (2*(self.y - self.output) while I got (2*(self.output - self.y)? Where is my mistake?

  2. Why is the author doing sigmoid_derivative(self.output)?