First order taylor expansion formal proof

104 Views Asked by At

Studying from Christopher Bishop's Pattern Recognition and Machine Learning book, I came across the following:

$$ \frac{\partial E_n}{\partial w_{ji}^{(l)}} = \frac{E_n(w_{ji}^{(l)} + \epsilon) - E_n(w_{ji}^{(l)})}{\epsilon} + O(\epsilon)$$

I am looking for a formal proof of the former using the definition of the big O notation:

$$ f(x) = O(g(x)) \text{ as } x \to a \iff \exists \delta,M\in\mathbb{R}^{++}.\ 0<|x - a|<\delta \implies|f(x)|\leq Mg(x) $$

Although $E_n$ is, in the context of the book, the error term of a neural network, we can consider $E_n$ as a complicated non-convex function. Since

$$ E_n= \sum_i(\hat y_i - y_i)^2 $$

and

$$ \hat y_i =\sum_kw_{ji}^{(2)} h\left(\sum_{k'}w_{kk'}^{(1)}x_{k'}\right) $$

Where $h$ is a non-linear differentiable function.

Which would translate to a proof of the big-O term in a first-order Taylor expansion for any function $f$.