Properties of norms of inverse Hessians near a solution

361 Views Asked by At

In theorem 3.5 of the book on Numerical Optimization by Jorge Nocedal and Stephen J Wright, Second edition, is given the proof for the quadratic convergence to the solution of a sequence of iterates generated by Newton's method. In the course of the proof is mentioned the following line:

"Since $\nabla^2 f(x^*)$ is nonsingular, there is a radius $r > 0$ such that $\lVert {\nabla^2 f_k ^{-1}}\rVert ^2 \leq 2\lVert {\nabla^2 f ^{-1}(x^*)}\rVert ^2$ for all $x_k$ with $\lVert x_k - x^*\rVert \leq r$ "

I don't quite understand how this statement holds. Could someone provide the proof for the above statement or a link that directs me towards an explanation of the statement.

Thank you!

2

There are 2 best solutions below

1
On

Sadly, I don't know of a good reference that actually runs through the full proof of this theorem. The snarky answer is that it happens because we assume that $\nabla^2 f(x)$ is Lipschitz continuous in the neighborhood of $x^*$, but the real answer is significantly more complicated. Here goes

Theorem 1 Assume that

  1. $X=\mathbb{R}^m$
  2. $A\in\mathscr{L}(X)$
  3. $\|A\| < 1$

Then

  1. $I-A$ is invertible
  2. $\|(I-A)^{-1}\|\leq\frac{1}{1-\|A\|}$

Proof Omitted, but uses a series representation of $(I-A)^{-1}$. See An Introduction to Hilbert Space by N. Young Theorem 7.10. $\square$.

Theorem 2 Assume that

  1. $X=\mathbb{R}^m$
  2. $A\in\mathscr{L}(X)$ and $A$ is nonsingular
  3. $B\in\mathscr{L}(X)$
  4. $\|A^{-1}(B-A)\| < 1$

Then

  1. $B$ is nonsingular
  2. $\|B^{-1}\| \leq \frac{\|A^{-1}\|}{1-\|A^{-1}(B-A)\|}$

Proof Omitted, but uses Theorem 2. $\square$

Theorem 3 Assume that

  1. $X=\mathbb{R}^m$
  2. $F^\prime(x)$ exists and $F^\prime:X\rightarrow \mathscr{L}(X)\in \mathrm{Lip}_\gamma(N_r(x))$
  3. $F^\prime(x)^{-1}$ exists and $\beta=\|F^\prime(x)^{-1}\|$
  4. $\|\delta x\| \leq \min\left\{r,\frac{1}{2\beta\gamma}\right\}$ (Basically, $x+\delta x\in N_r(x)$, but possibly closer depending on $\beta$ and $\gamma$.)

Then

  1. $F^\prime(x+\delta x)^{-1}$ exists
  2. $\|F^\prime(x+\delta x)^{-1}\| \leq 2\|F^\prime(x)^{-1}\|$

Proof

$$\begin{array}{rll} &\|F^\prime(x)^{-1}(F^\prime(x+\delta x)-F^\prime(x))\|\\ \leq& \|F^\prime(x)^{-1}\|\|F^\prime(x+\delta x)-F^\prime(x)\| & \textrm{Submultiplicative norm}\\ =& \beta \|F^\prime(x+\delta x) - F^\prime(x)\| & \textrm{Assumption 3}\\ \leq& \beta \gamma \|x+\delta x-x\| & \textrm{Assumption 2}\\ =& \beta \gamma \|\delta x\|\\ \leq& \frac{1}{2} & \textrm{Assumption 4} \end{array}$$

Therefore, from Theorem 2, $F^\prime(x+\delta x)^{-1}$ exists. This proves 1. Also, from Theorem 2

$$ \|F^\prime(x+\delta x)^{-1}\|\leq \frac{\|F^\prime(x)^{-1}\|}{1-\|F^\prime(x)^{-1}(F^\prime(x+\delta x) - F^\prime(x))\|} \leq \frac{\|F^\prime(x)^{-1}\|}{1-1/2} = 2 \|F^\prime(x)^{-1}\| $$

This proves 2. $\square$

Ok, so you get the result we want by noting that $F(x) = \nabla f(x)$, which means that $F^\prime(x) = \nabla^2 f(x)$. In addition, we have that $x_k$ can be written as $x_k = x^* + \delta x_k$.

Anyway, it turns out the Newton's method quadratic convergence proof is a quite a bit longer than what most references make it out to be if you actually run through all of the details. No, we don't have to go back to the axiom of choice, but this result is kind of important and rarely given. That said, if someone has a good reference for stuff like this, I'd like to know.

0
On

I'll complete @wyer33 's omitted proofs of Theorems.

Theorem 1 Assume that

  1. $=ℝ^m$
  2. $∈ℒ()$
  3. $‖‖<1$

Then

  1. $−$ is invertible
  2. $‖(−)^{-1}‖≤\frac{1}{1−‖‖}$

Proof
For $\|A\| < 1$, we have $$\left\|\sum_{i=0}^nA^i\right\| \leq\sum_{i=0}^n\left\|A^i\right\| \leq\sum_{i=0}^n\left\|A\right\|^i =\frac{1-\|A\|^{n+1}}{1-\|A\|}$$

Thus
$$\|(I-A)^{-1}\| = \left\|\sum_{i=0}^{\infty}A^i\right\| = \lim_{n\to\infty}\left\|\sum_{i=0}^nA^i\right\| \leq \lim_{n\to\infty} \frac{1-\|A\|^{n+1}}{1-\|A\|} =\frac{1}{1-\|A\|}$$

This proves Theorem 1.◻

Theorem 2 Assume that

  1. $ = ℝ^m$
  2. $ ∈ ℒ()$ and $$ is nonsingular
  3. $∈ℒ()$
  4. $‖^{−1}(−)‖<1$

Then

  1. $$ is nonsingular
  2. $‖^{−1}‖\leq\frac{‖^{−1}‖}{1−‖^{−1}(−)‖}$

Proof
\begin{align} \|B^{-1}\| &= \|(A-AA^{-1}(A-B))^{-1}\| \\ &\leq \|(I - A^{-1}(A-B))^{-1}A^{-1}\| \\ &\leq \|(I - A^{-1}(A-B))^{-1}\|\|A^{-1}\| \\ \text{by assumption 4 and using } \textbf{Theorem 1 } & \leq \frac{1}{1−‖^{−1}(A−B)‖}\|A^{-1}\| \\ & \leq \frac{1}{1−‖^{−1}(−)‖}\|A^{-1}\| \end{align} This proves Theorem 2.◻