What is The Correct Formula For Neural Network Gradient Descent?

78 Views Asked by At

Given a neural network with an activation function $f(x) = \frac{1}{1+e^{-x}}$, I calculated the gradient of any weight $W$ at layer $l$ where the activation sum of the layer is $A$ in the index $i,j$ to be: $\frac{\partial Cost}{\partial W_{ij}^{l}} = \frac{A_{i0}^l(1-A_{i0}^l)A_{j0}^{l-1}}{A_{i0}^l}\sum_{k=o}^{rows(W^{l+1})}\frac{\partial Cost}{\partial W_{ki}^{l+1}}W_{ki}^{l+1}$ for hidden layers and $\frac{\partial Cost}{\partial W_{ij}^{l}}=2(A^{l}_{i0}-t_{i0})(A_{i0}^l(1-A_{i0}^l)A_{j0}^{l-1})$ for output layers. Is what I have calculated correct? I have seen a lot of formulas when researching and they don't include fractions or $A_{j0}^{l-1}$, which from my understanding ignores the product rule. Here is part of the math I used to derive the formulas: fomrulas