Why is numerical differentiation of neural network has a scaling of big O(w^2)?

43 Views Asked by At

In the book Pattern Recognition and Machine Learning(Bishop) chapter 5.3.3, it talks about the problem of numerical differentiation compared to backpropagation. While simple forward and back propagation is each O(w) operation, numerical differentiation results in $O(w^2)$. I don't understand why this would result in $O(w^2)$. Don't we just find $E_{n}(w_{ji}+e)$ and $E_{n}(w_{ji}-e)$? how does this results in $O(w^2)$? can someone please give me an explanation? thanks in advance