I am trying to calculate derivatives for backpropagation in 2-layer Neural Network using 3Blue1Brown video here to bulid Neural Network from here with sigmoid activation function.
My calculations are different from the second link and I don't know why. I obtain derivative for the last layer from this formula:
$\frac{Co}{dw_L} = \frac{z_L}{dw_L} \frac{a_L}{dz_L} \frac{Co}{da_L}$
and derivative for the first (second to last layer) from this formula:
$\frac{Co}{dw_{L-1}} = \frac{z_{L-1}}{dw_{L-1}} \frac{a_{L-1}}{dz_{L-1}} \frac{z_L}{da_{L-1}} \frac{a_L}{dz_L} \frac{Co}{da_L}$
where:
${Co}= (a_L - y)^2$
${z_L} = w_L*a_{L-1}$
${a_L} = \sigma(z_L)$
Summing it all up:
$\frac{Co}{dw_L} = \frac{z_L}{dw_L} \frac{a_L}{dz_L} \frac{Co}{da_L} = a_{L-1} * \sigma'(z_L) * 2*(a_L - y)$
$\frac{Co}{dw_{L-1}} = \frac{z_{L-1}}{dw_{L-1}} \frac{a_{L-1}}{dz_{L-1}} \frac{z_L}{da_{L-1}} \frac{a_L}{dz_L} \frac{Co}{da_L} = a_{L-2}* \sigma'(z_{L-1}) * \sigma'(w_L * a_{L-1}) * w_L$
How did he get (2*(self.y - self.output) while I got (2*(self.output - self.y)? Where is my mistake?
Why is the author doing sigmoid_derivative(self.output)?