Proof of Theorem 4.5 about Trust Region algorithm convergence in Nocedal & Wright

60 Views Asked by At

I am reading Numerical Optimization by Nocedal & Wright, and I am having trouble understanding the proof of Theorem 4.6. I have been stuck on this theorem for many hours, so any help is greatly appreciated.

More specifically, I it is hard for me to follow:

  1. How $(13)$ to $(17)$ are derived.
  2. The intuition of the final contradiction.

The theorem and the proof follows.

$$\min_{p \in \mathbb{R}^n} m_k (p) = f_k + g_k^Tr + \frac{1}{2} p^T B p \quad \text{s.t.} \quad \| p\| \leq \Delta_k \tag{1}$$

$$S = \{ x \colon f(x) \leq f(x_0)\} \tag{2}$$ $$S(R_0) = \{ x \colon \|x-y \|<R_0 \: \text{for some} \: r \in S \} \tag{3}$$

Theorem 4.5: Let $\eta=0$ in Algorithm 4. Suppose, that $\|B_k \| \leq \beta$ for some constant $\beta$, that $f$ is bounded below on the level set $S$ and Lipschitz continuously differentiable in the neighborhood $S(R_0)$ for some $R_0 > 0$, and that all approximate solutions of $\min_{p \in \mathbb{R}^n} m_k (p)$ satisfy the inequalities

$$m_k(0) -m_k(p_k) \geq c_1 \| g_k\| \min \left( \Delta_k, \frac{\| g_k\|}{\| B_k \|}\right)\tag{4}$$ and

$$\|p_k\| \leq \gamma \Delta_k,\tag{5}$$ for some constants $c_1 \leq 1$ and $\gamma \geq 1$. We then have

$$\lim_{k \to \infty } \inf \|g_k\| = 0.\tag{6}$$

Proof: Let $$\rho_k = \frac{f(x_k)-f(x_k + p_k)}{m_k(0)-m_k(p_k)}.\tag{7}$$ Then, we have

$$|\rho_k -1 | = \left|\frac{m(p_k)-f(x_k + p_k)}{m_k(0)-m_k(p_k)} \right|.\tag{8}$$ Using Taylor expansion and $\|B_k \| \leq \beta$ we get

$$|m_k(p_k) -f(x_k + p_k)| \leq \frac{\beta}{2}\|p_k\|^2 + \beta_1 \| p_k\|^2\tag{9}$$ where $\beta_1$ is t Lipschitz constant for $g$ on the set $S(R_0)$ and assumed that $\|p_k \|\leq R_0$ to ensure that $x_k$ and $x_k+t p_k$ both lie in the set $S(R_0)$. Suppose for contradiction that there is $\epsilon > 0$ and positive index $K$ such that

$$\| g_k\| \geq \epsilon,\tag{10}$$ for all $k \geq K$. From $(4)$, we have for $k \geq K$ that

$$m_k(0) -m_k(p_k) \geq c_1 \epsilon \min \left( \Delta_k,\frac{\epsilon}{\beta}\right).\tag{11}$$ Using, $(11)$, $(9)$, and the bound $\| p_k\| \leq \gamma \Delta_k$, we have

$$|\rho_k -1| \leq \frac{\gamma^2 \Delta_k^2 (\beta/2+\beta_1)}{c_1 \epsilon \min (\Delta_k, \epsilon/\beta)}.\tag{12}$$

We now derive a bound on the right-hand-side that holds for all sufficiently small values of $\Delta_k$, that is, for all $\Delta_k \leq \bar{\Delta}$, where $\bar{\Delta}$ is defines as follows:

$$\bar{\Delta} = \min \left( \frac{1}{2} \frac{c_1 \epsilon}{\gamma^2(\beta/2 +\beta_1)}, \frac{R_0}{\gamma}\right).\tag{13}$$

The $R_0 / \gamma$ term in this definition ensures that the bound in $(9)$ is valid (because $\| p_k\| \leq \gamma \Delta_k \leq \bar{\Delta} \leq R_0$). Note that since $c_1 \leq 1$ and $\gamma \geq 1$, we have $\bar{\Delta} \leq \epsilon/\beta$. The latter condition implies that for all $\Delta_k \in [0, \bar{\Delta}]$, we have $\min(\Delta_k, \epsilon/\beta) =\Delta_k$, so from $(12)$ and $(13)$, we have

$$|\rho_k -1| \leq \frac{\gamma^2 \Delta_k (\beta/2 + \beta_1)}{c_1 \epsilon} \leq \frac{\gamma^2 \bar{\Delta} (\beta/2 + \beta_1)}{c_1 \epsilon} \leq \frac{1}{2}.\tag{14}$$ Therefore

$$\rho_k > \frac{1}{4},\tag{15}$$ and so by the workings of Algorithm 4.1 we have $\Delta_{k+1} \geq \Delta_k$ whenever $\Delta_k$ fall below the threshold $\bar{\Delta}$. It follows that reduction of $\Delta_k$ (by a factor of $1/4$) can occur in Algorithm 4.1 only if $\Delta_k \geq \bar{\Delta}$ and therefore we conclude that

$$\Delta_k \geq \min (\Delta_k, \bar{\Delta} /4)\tag{16}$$ for all $k \geq K$. Suppose now that there is an infinite sub-sequence $\mathcal{K}$ such that $\rho_k \geq 1/4$ for $k \in \mathcal{K}$. For $k \in \mathcal{K}$ and $k \geq K$, we have from $(11)$ that

$$f(x_k) - f(x_{k+1}) = f(x_k) - f(x_k + p_k) \geq \frac{1}{4} (m_k(0) - m_k (p_k)) \geq \frac{c_1 \epsilon}{4} \min (\Delta_k, \epsilon / \beta).\tag{17}$$ Since $f$ is bounded bellow, it follows this inequality that

$$\lim_{k \in \mathcal{K}, k \to \infty} \Delta_k =0, \tag{18}$$

contradicting $(16)$. Hence no such infinite sub-sequence $\mathcal{K}$ can exist and we must have $\rho_k < 1/4$ for sufficiently large $k$. In this case, $\Delta_k$ will eventually be multiplied by $1/4$ at every iteration and we have $\lim_{k \in \mathcal{K}, k \to \infty} \Delta_k =0 $, which again contradicts $(10)$. Hence, the original assertion must be false, giving $(6)$.