Boyd & Vandenberghe, page 488 — convergence analysis of Newton's method

Question

Boyd & Vandenberghe, page 488 — convergence analysis of Newton's method

129 Views Asked by Bumbble Comm At 30 Mar 2026 - 11:03

On page 487 of Boyd & Vandenberghe's Convex Optimization, the convergenge analysis of Newton's method (Algorithm 9.5) is based on the backtracking line search. My questions are around (9.33) on Page 488. The assumption is $\eta \leq m^2/L$ where $m$ is the constant of strongly convexity, and $L$ is the Lipschitz constant for the Hessian of $f$, i.e., $\nabla^2f(x)$.

In the paragraph under (9.33), it says that if $\left\|\nabla f\left(x^{(k)}\right)\right\|_{2}<\eta$, then $\left\|\nabla f\left(x^{(k+1)}\right)\right\|_{2}<\eta$ because of $\eta\leq m^2/L$. Why is this? How can we get this result for the iteration $k+1$? That is to say, $\left\|\nabla f\left(x^{(l)}\right)\right\|_{2}<\eta$ holds for all $l\geq k$.
Also, in the end of that paragraph, it says that '' Therefore for all $l\geq k$, the algorithm takes a full Newton step $t=1$''. Why is $t^{(k)}=1$ once $\left\|\nabla f\left(x^{(k)}\right)\right\|_{2}<\eta$, not a number less than $1$?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2021-10-18 05:25:59

For the first question, let $g(t)=\|\nabla f(x+t\Delta x_{nt})\|_2^2$ with $t\ge 0$, then since $\Delta x_{nt}=-\nabla^2f(x)^{-1}\nabla f(x)$ we have

$$ \begin{aligned} g'(t) &= 2\nabla f(x+t\Delta x_{nt})^T \nabla^2 f(x+t\Delta x_{nt})\Delta x_{nt} \end{aligned} $$ $$ \begin{aligned} g'(0) &= 2\nabla f(x)^T\nabla^2 f(x)\Delta x_{nt}\\ &=2\nabla f(x)^T\nabla^2 f(x)(-\nabla^2f(x)^{-1}\nabla f(x))\\ &=-2\nabla f(x)^T\nabla f(x)\\ &<0 \end{aligned} $$ Since $g'(t)$ is continuous, there exists $t>0$ such that $g(t)<g(0)$, i.e., $\|\nabla f(x^{(k)}+t\Delta x^{(k)}_{nt})\|_2^2=\|\nabla f(x^{(k+1)})\|_2^2<\|\nabla f(x^{(k)})\|_2^2$. But this has nothing to do with $\eta\le m^2/L$(we didn't use this condition in the above derivation) and only answers the first question, why is $t=1$ for all $l\geq k$?

**Bumbble Comm** · Answer 2 · 2021-10-18 11:34:26

The idea very briefly: Newton's method is (with obvious notations $\nabla f_k=\nabla f(x_k)$ etc) $$ (x_{k+1}-x_k=)\quad\Delta x_k=-\nabla^2f_k^{-1}\nabla f_k\qquad\Leftrightarrow\qquad \nabla f_k+\nabla^2f_k\Delta x_k=0. $$

Taylor: $\nabla f_{k+1}=\nabla f_k+\nabla^2f(\xi)\Delta x_k$.
Estimate (use 1. + triangle inequality + Lipschitz) $$ \nabla f_{k+1}=\nabla f_{k+1}-0=\nabla f_{k+1}-(\nabla f_k+\nabla^2f_k\Delta x_k)= (\nabla^2f(\xi)-\nabla^2f_k)\Delta x_k. $$
Estimate (triangle inequality + $\nabla^2 f\ge m I$) $$ \Delta x_k=-\nabla^2f_k^{-1}\nabla f_k. $$ The result will read $$ \|\nabla f_{k+1}\|\le\frac{L}{m^2}\eta^2. $$ The rest should be obvious by now.

For the extra $1/2$: use the integral form of the remainder

Taylor: $\nabla f_{k+1}=\nabla f_k+\int_0^1\nabla^2f(x_k+t\Delta x_k)\Delta x_k\,dt$
Estimate: \begin{align} \|\nabla f_{k+1}\|&=\|\nabla f_{k+1}-(\nabla f_k+\nabla^2f_k\Delta x_k)\|\\ &=\left\|\int_0^1(\nabla^2f(x_k+t\Delta x_k)-\nabla^2f_k)\Delta x_k\,dt\right\|\\ &\le\int_0^1\|\nabla^2f(x_k+t\Delta x_k)-\nabla^2f(x_k)\|\cdot\|\Delta x_k\|\,dt.\\ &\le L\int_0^1 t\|\Delta x_k\|^2 dt\\ &= \frac{L}{2}\|\Delta x_k\|^2\\ &=\frac{L}{2}\|\nabla^2f_k^{-1}\nabla f_k\|^2\\ &\le \frac{L}{2m^2}\|\nabla f_k\|^2 \end{align}

Boyd & Vandenberghe, page 488 — convergence analysis of Newton's method

There are 2 best solutions below

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in NEWTON-RAPHSON

Trending Questions

Popular # Hahtags

Popular Questions