Newton's method has local quadratic convergence. which method is the best to start from?

456 Views Asked by At

Newton's method has local quadratic convergence. Is there a good method that I can run for a few iteration to get a better domain to start from?

1

There are 1 best solutions below

0
On

Because your post is tagged optimization, it is not clear whether you mean Newton's method to identify a root or Newton's method to identify a local minimum (or more generally, a stationary point).

If you mean to identify a root of a system of equations $F(x) = 0$, where $x = (x_1, \dots, x_n)$, one way to globalize Newton's method is to formulate the problem as the optimization problem $$ \min_x \ f(x) \quad \text{where} \quad f(x) := \tfrac{1}{2} \|F(x)\|^2, $$ and where $$ F(x) = \begin{bmatrix} f_1(x) \\ \dots \\ f_m(x) \end{bmatrix}, \qquad \|F(x)\|^2 = \sum_{i=1}^m f_i(x)^2. $$ Now the gradient and Hessian of $f$ can be computed in terms of $F$: $$ \nabla f(x) = J(x)^T F(x), \qquad \nabla^2 f(x) = J(x)^T J(x) + \sum_{i=1}^m f_i(x) \nabla^2 f_i(x), $$ where $J$ is the Jacobian of $F$, i.e., the matrix whose $i$-th row is $\nabla f_i(x)^T$.

You can apply Newton's method to $f$ and globalize it using a linesearch, e.g., an Armijo linesearch. The procedure is as follows:

  1. At $x_k$ compute Newton's direction by solving $\nabla^2 f(x_k) d_k = -\nabla f(x_k)$. Because of the form of $\nabla^2 f(x_k)$, it may be necessary to modify it to ensure that $d_k$ is a descent direction. Locally, assuming you are close to a minimizer, it shouldn't be necessary to modify.
  2. Compute a step size $t \in (0, 1]$ such that $$ f(x_k + t d_k) \leq f(x_k) + \alpha t \nabla f(x_k)^T d_k $$ starting from $t=1$ and halving $t$ until the inequality is satisfied. If $d_k$ is a descent direction, this procedure is guaranteed to succeed.
  3. Set $x_{k+1} = x_k + t d_k$.

The advantage of this approach is that if you converge to a minimizer where $\nabla^2 f$ is positive definite, you retain Newton's quadratic convergence locally. That is because you can guarantee that $t=1$ at all iterations asymptotically.

The drawback of this approach is that you may converge to a stationary point $x^*$ of $f$ (perhaps even a minimizer) where $F(x) \not = 0$. This can only happen if $J(x^*)$ is rank deficient. If you can ensure that $J$ always has full rank, the procedure is sound. If not and you hit a spurious solution, one workaround is to start the minimization from another initial guess. Another workaround is to apply a deflation technique to remove this minimizer from the problem.

If you were talking about Newton's method for optimization, the procedure is identical except that you would use the objective $f$ defined in your problem.

Most optimization methods are local in nature and must be globalized using a linesearch. There exist other globalization methods (trust regions, homotopy, $\dots$) but the idea is the same.