In Andrew Ng course, we need to compute dz1 in a NN with 2 hidden layers.
So why we don't compute da[2] in order to properly do the chain rule/back-propagation?
why this is correct ?
$\frac{(dL)}{(dz[1])} = \frac{(dL)}{(dz[2])} * \frac{(dz[2])}{(da[1])} * \frac{(da[1])}{(dz[1])} $
and this is not?
$\frac{(dL)}{(dz[1])} = \frac{(dL)}{(da[2])} * \frac{(da[2])}{(dz[2])} * \frac{(dz[2])}{(da[1])} * \frac{(da[1])}{(dz[1])} $