I've been looking through the derivation of Halley's theorem presented here, and attempting to replicate it myself (as well as generalising it to higher orders). Unfortunately, partway through this proof, I run into difficulties with the solution seeming to simplify way too far/way too easily, but can't seem to find my mistake. (Summation convention used below).
Firstly, finding 'Newton's step' (with components denoted ${a}_i$):
$$\nabla_i f(x+a)\approx\nabla_i f(x)+a_j\nabla_j\nabla_i f(x)$$
Defining $a$ such that, to first order, we'd expect to set this to zero, we obtain:
$$\nabla_i f(x)+a_j\nabla_j\nabla_i f(x)=0$$
Which is a linear equation we can solve for $a_j$, as long as we know the Hessian of f.
We then consider setting a second order approximation to zero:
$$\nabla_if(x+\delta x)\approx\nabla_if(x)+\delta x_{j_1}\nabla_{j_1}\nabla_if(x)+\frac{1}{2}\delta x_{j_1}\delta x_{j_2}\nabla_{j_2}\nabla_{j_1}\nabla_if(x)=0$$
Because this is no longer a linear problem, we substitute the value of $a$ in from the previous expression for one of the parts of the quadratic term, and define $b$ as the Halley step:
$$\nabla_if(x)+b_{j_1}\nabla_{j_1}\nabla_if(x)+\frac{1}{2}b_{j_1}a_{j_2}\nabla_{j_2}\nabla_{j_1}\nabla_if(x)=0$$
The idea being that we can factor this to retrieve Halley's method. However, given that our step sizes are constant with respect to the derivatives we're taking, we should be able to commute with them, and, the same going for the different partial derivatives themselves, we should be able to rewrite this as follows:
$$\nabla_if(x)+b_{j_1}\nabla_{j_1}\nabla_if(x)+\frac{1}{2}b_{j_1}\nabla_{j_1}(a_{j_2}\nabla_{j_2}\nabla_if(x))=0$$
However, comparing this to our defining equation for $a$, this simplifies hugely:
$$\nabla_if(x)+a_{j_2}\nabla_{j_2}\nabla_if(x)=0$$
$$\nabla_if(x)+b_{j_1}\nabla_{j_1}\nabla_if(x)-\frac{1}{2}b_{j_1}\nabla_{j_1}\nabla_if(x)=0$$
$$\nabla_if(x)+\frac{1}{2}b_{j_1}\nabla_{j_1}\nabla_if(x)=0$$
Which looks far more like Newton's method with a modified step size than I was expecting.
Any suggestions on where my logic is flawed, and how to get this derivation back on track would be greatly appreciated.