Convergence proof on Newton's method from Boyd & Vandenberghe's book on convex optimization

157 Views Asked by At

From Boyd & Vandenberghe's book, page 491:

Applying the Lipschitz condition,we have $$\begin{aligned} \left\|\nabla f\left(x^{+}\right)\right\|_{2} &=\left\|\nabla f\left(x+\Delta x_{\mathrm{nt}}\right)-\nabla f(x)-\nabla^{2} f(x) \Delta x_{\mathrm{nt}}\right\|_{2} \\ &=\left\|\int_{0}^{1}\left(\nabla^{2} f\left(x+t \Delta x_{\mathrm{nt}}\right)-\nabla^{2} f(x)\right) \Delta x_{\mathrm{nt}} d t\right\|_{2} \\ & \leq \frac{L}{2}\left\|\Delta x_{\mathrm{nt}}\right\|_{2}^{2} \\ &=\frac{L}{2}\left\|\nabla^{2} f(x)^{-1} \nabla f(x)\right\|_{2}^{2} \\ & \leq \frac{L}{2 m^{2}}\|\nabla f(x)\|_{2}^{2} \end{aligned}$$ to get the inequality assumption (9.33), which is:

$$\frac{L}{2 m^{2}}\left\|\nabla f\left(x^{(k+1)}\right)\right\|_{2} \leq\left(\frac{L}{2 m^{2}}\left\|\nabla f\left(x^{(k)}\right)\right\|_{2}\right)^{2}$$

I dont understand how he get the first two lines of the derivation?

My understanding is that $\nabla f\left(x^{+}\right)$ equals $\nabla f\left(x+\Delta x_{\mathrm{nt}}\right)$ already when t=1, so where do the rest of two terms come from? please anyone help me, really appreciate.

2

There are 2 best solutions below

0
On BEST ANSWER

I agree with Deepak's point, but I will expand his answer.

As defined on Page 484, the Newton step is $$ \Delta x_{\mathrm{nt}}=-\nabla^{2} f(x)^{-1} \nabla f(x) $$ So, $$ \begin{aligned} -\nabla f(x)-\nabla^{2} f(x) \Delta x_{nt} &=-\nabla f(x)-\nabla^{2} f(x)( -\nabla^{2} f(x)^{-1} \nabla f(x))\\ &=-\nabla f(x)+\nabla^{2} f(x)\nabla^{2} f(x)^{-1} \nabla f(x)\\ &=0 \end{aligned} $$

The above answered the fist line of the derivation in the question.

For the second line, since there is no $t$ in $\nabla^{2} f(x) \Delta x_{nt}$, we have

$$ \int_0^1 \nabla^{2} f(x) \Delta x_{nt}dt = \nabla^{2} f(x) \Delta x_{nt}(1-0) = \nabla^{2} f(x) \Delta x_{nt} $$

As we know that $\nabla_t(\nabla f(x+t\Delta x_{nt}))=\nabla^2f(x+t\Delta x_{nt})\Delta x_{nt}$, we have the following derivation $$ \begin{aligned} \int_0^1 \nabla^2f(x+t\Delta x_{nt})\Delta x_{nt}dt &=\int_0^1 d(\nabla f(x+t\Delta x_{nt})+c)\\ &=\nabla f(x+t\Delta x_{nt})+c|_0^1\\ &=(\nabla f(x+\Delta x_{nt})+c)-(\nabla f(x)+c)\\ &=0 \end{aligned} $$ where c is a constant, and the integral interval is $[0,1]$ because t is supposed to be in that range.

0
On

$-\nabla f(x)-\nabla^{2} f(x) \Delta x_{\mathrm{nt}}=0$ since $\Delta x _{nt}$ is the Newton search direction.