Newton's method derivation and gradient descent.

973 Views Asked by At

I am confused by answers here and there... I want to clarify on derivation process of Newton's method. So basically taking first 3 terms of a Taylor expansion: $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x))$$

Now I have seen two ways to calculate Newton's method for updating x.

  1. solve for x when f(x) = 0 $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)) = 0$$

    then there is a whole process of eliminating f''(x), which finally gives $$ x = x - \frac{f(x)}{f'(x)} $$

  2. Since I want the next point to be local minimum/maximum, I want to find a x such that f'(a) =0 where x = a(f(x) expands at x = a):

$$ f(x+\Delta x) = f(x) + J(x)\delta x + (1/2)\Delta x^T H(x)\Delta x$$

J(x) is the Jacobian matrix and H(x) = Hessian matrix.

So if taking derivative over delta x, $$\Delta x = -H(x)^{-1}J(x)^T$$

So converting it to first order equation,

$$ y = f(a)+h f'(a)+\frac{1}{2} h^2 f''(a)$$ $$ y' = f'(a) + hf''(a) = 0$$ $$ h = -f'(a)/f''(a) $$ since h = x1 - x0 $$ x = x - \frac{f'(a)}{f''(a)} $$

And those two conclusions seem to conflict with each other. I could understand the 1st method clearly, and the 2nd method also sort of making sense to me. Could someone help to explain? Thanks!

1

There are 1 best solutions below

2
On BEST ANSWER

There are two different versions of "Newton's method":

  1. Newton's method for root finding solves $g(x) = 0$, where $g: \mathbb R^n \to \mathbb R^n$.
  2. Newton's method for optimization minimizes $f(x)$, where $f: \mathbb R^n \to \mathbb R$.

Newton's method for root finding can be derived as follows. Given $x^k \in \mathbb R^n$, which is our current best estimate of a root for $g$, we would ideally like to find a vector $\Delta x$ such that $g(x^k + \Delta x) = 0$. But that is too difficult, so we instead compute a vector $\Delta x$ which satisfies $g(x^k) + g'(x^k)\Delta x = 0$. The solution is $\Delta x = - g'(x^k)^{-1} g(x^k)$ (assuming that $g'(x^k)$ is invertible). Then $x^{k+1} = x^k + \Delta x$ is our new and improved estimate for a root of $g$. Note that this derivation makes use of the first-order approximation $g(x^k + \Delta x) \approx g(x^k) + g'(x^k) \Delta x$.

Newton's method for optimization can be derived by using Newton's method for root finding to solve the equation $\nabla f(x) = 0$. Alternatively, Newton's method for optimization can be derived as follows. Given $x^k \in \mathbb R^n$, which is our current best guess for a minimizer of $f$, we would ideally like to find a vector $\Delta x$ which minimizes $f(x^k + \Delta x)$. But that is too difficult, so we instead compute a vector $\Delta x$ which minimizes $f(x^k) + \nabla f(x^k)^T \Delta x + \frac12 \Delta x^T Hf(x^k) \Delta x$. The solution is $\Delta x^k = - Hf(x^k)^{-1} \nabla f(x^k)$ (assuming that $Hf(x^k)$ is invertible). Then $x^{k+1} = x^k + \Delta x$ is our new and improved estimate for a minimizer of $f$. Note that this derivation is based on the second-order approximation $f(x^k + \Delta x) \approx f(x^k) + \nabla f(x^k)^T \Delta x + \frac12 \Delta x^T Hf(x^k) \Delta x$.