I am reading article regarding backward propagation https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ . Lets say if I follow the example in the article but using only 3 nodes, with the 1st node an input, 2nd node the hidden layer and 3rd node the output so at 2nd node, the result is $$n_1=I_1*W_1+B_1$$ which is feed into the logistic function $$O_1=\frac{1}{1+e^{-n_1}}$$ and this outputs($O_1$) is then feed into the last node $$n_2=O_1*W_2+B_2$$ that is finally feed into the logistic function again to get $$O_2=\frac{1}{1+e^{-n_2}}$$ The error with respect to w1 can be written as $$error=\frac{1}{2}(target-O_2)2$$
a)Let's say initially using $w_1=0.4$ and get the result of $E_1$, then using $w_1=0.401$ to get the result of $E_2$ again. Does this represent numerical approximation for $dE/dw$ ?$$\frac{dE}{dw_1}=\frac{E_2-E_1}{0.401-0.400}$$
b)Is this $$E(w1+h)=E(w1)+h*\frac{dE}{dw1}+\frac{1}{2}h^2\frac{d^2E}{dw1^2}+...?$$ A correct representation of error expanded using Taylor series around w1?
c)Is it possible to use similar Runge-Kutta methods to obtain higher order of the error?
Yes, difference quotients give approximations to derivatives.
The problem is that you want all the partial derivatives, and the coefficient $W_1$ is usually a matrix. Forward differentiation like with divided differences, complex step or dual numbers only computes one partial derivative at a time, thus backwards differentiation is applied which computes all derivatives in one sweep.
You can still use the difference quotients to do some spot testing of the result of the backpropagation implementation.
There is the use of truncated Taylor series to compute higher order derivatives. One can combine such Taylor series computed at different lines to get to higher order mixed derivatives. The extraction of the derivatives becomes rapidly ill-conditioned as solution of a linear system.