I am confused by answers here and there... I want to clarify on derivation process of Newton's method. So basically taking first 3 terms of a Taylor expansion: $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x))$$
Now I have seen two ways to calculate Newton's method for updating x.
solve for x when f(x) = 0 $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)) = 0$$
then there is a whole process of eliminating f''(x), which finally gives $$ x = x - \frac{f(x)}{f'(x)} $$
Since I want the next point to be local minimum/maximum, I want to find a x such that f'(a) =0 where x = a(f(x) expands at x = a):
$$ f(x+\Delta x) = f(x) + J(x)\delta x + (1/2)\Delta x^T H(x)\Delta x$$
J(x) is the Jacobian matrix and H(x) = Hessian matrix.
So if taking derivative over delta x, $$\Delta x = -H(x)^{-1}J(x)^T$$
So converting it to first order equation,
$$ y = f(a)+h f'(a)+\frac{1}{2} h^2 f''(a)$$ $$ y' = f'(a) + hf''(a) = 0$$ $$ h = -f'(a)/f''(a) $$ since h = x1 - x0 $$ x = x - \frac{f'(a)}{f''(a)} $$
And those two conclusions seem to conflict with each other. I could understand the 1st method clearly, and the 2nd method also sort of making sense to me. Could someone help to explain? Thanks!
There are two different versions of "Newton's method":
Newton's method for root finding can be derived as follows. Given $x^k \in \mathbb R^n$, which is our current best estimate of a root for $g$, we would ideally like to find a vector $\Delta x$ such that $g(x^k + \Delta x) = 0$. But that is too difficult, so we instead compute a vector $\Delta x$ which satisfies $g(x^k) + g'(x^k)\Delta x = 0$. The solution is $\Delta x = - g'(x^k)^{-1} g(x^k)$ (assuming that $g'(x^k)$ is invertible). Then $x^{k+1} = x^k + \Delta x$ is our new and improved estimate for a root of $g$. Note that this derivation makes use of the first-order approximation $g(x^k + \Delta x) \approx g(x^k) + g'(x^k) \Delta x$.
Newton's method for optimization can be derived by using Newton's method for root finding to solve the equation $\nabla f(x) = 0$. Alternatively, Newton's method for optimization can be derived as follows. Given $x^k \in \mathbb R^n$, which is our current best guess for a minimizer of $f$, we would ideally like to find a vector $\Delta x$ which minimizes $f(x^k + \Delta x)$. But that is too difficult, so we instead compute a vector $\Delta x$ which minimizes $f(x^k) + \nabla f(x^k)^T \Delta x + \frac12 \Delta x^T Hf(x^k) \Delta x$. The solution is $\Delta x^k = - Hf(x^k)^{-1} \nabla f(x^k)$ (assuming that $Hf(x^k)$ is invertible). Then $x^{k+1} = x^k + \Delta x$ is our new and improved estimate for a minimizer of $f$. Note that this derivation is based on the second-order approximation $f(x^k + \Delta x) \approx f(x^k) + \nabla f(x^k)^T \Delta x + \frac12 \Delta x^T Hf(x^k) \Delta x$.