Convergence to local minimum of gradient descent with exact line search for $C^1$ function

Question

Convergence to local minimum of gradient descent with exact line search for $C^1$ function

1k Views Asked by Bumbble Comm At 10 May 2026 - 7:20

Here is a Theorem ($3.1$) from "Introduction to Optimization" by Boris Polyak.

Let $f : \mathbb{R}^n \to \mathbb R$ be a continuously differentiable function and let $\{x: f(x) \le f(x^0)\}$ be compact. Let $\{x_k\}_{k=0}^{\infty}$ be a sequence generated by following procedure \begin{align*} x^{k+1} = x^k - \gamma_k \nabla f(x^k), \end{align*} where \begin{align*} \gamma_k = \text{argmin}_{\gamma \ge 0} f\left(x^k - \gamma \nabla f(x^k)\right). \end{align*} Then $\nabla f(x^k) \to 0$ and the sequence $\{x^k\}$ has limit points each of which is stationary, i.e., we can find the subsequence $x^{k_i} \to x^*$ with $\nabla f(x^*) =0$.

The proof is omitted in the book. The usually proof I have seen in the literature assumping that the gradient satisfies a Lipschitz condition, i.e., $\| \nabla f(x) - \nabla f(y)\| \le L \|x-y\|$ for some constant $L$ and all $x, y$ and then by arguing there is some constant decrease in function value in each iteration. Combining the fact the function is bounded below, we get $\sum_{i=0}^{\infty} \| \nabla f(x^i)\| < \infty$ and consequently $\lim \|\nabla f(x^k)\| \to 0$.

In mimicking this kind proof, the difficulty is the decrease is not constant. Here is my attemp: If $\| \nabla f(x^k)\| \not\to 0$, then $\limsup_{k \to \infty} \|\nabla f(x^k)\| = \delta > 0$. Let $\{x^{k_j}\}$ be a subsequence realizing this limit. For each $k_j$, \begin{align*} f(x^{k_{j+1}}) = f(x^{k_j}) - \gamma_{k_j}^2 \|\nabla f(x^{k_j})\| + o(\gamma_{k_j} \|\nabla f(x^{k_j})\|), \end{align*} by Taylor expansion. Since $\nabla f(x^{k_j}) \neq 0$, $\gamma_{k_j} \neq 0$ we know that $f(x^{k_{j+1}}) < f(x^{k_j})$, we get $\sum_{j=1}^{\infty} \gamma_{k_j} \|\nabla f(x^{k_j})\|^2 < \infty$. But since we cannot control the stepsize, I don't know how to proceed.

Another question is why Polyak states we can find a subsequence with $x^{k_j} \to x^*$. It seems to me the sequence should be convergent to some local minimum. What am I missing here?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Here is a proof that does not need Lipschitz continuity.

Proof: We know $\{f(x_k)\}_{k=1}^{\infty}$ is nonincreasing and so $x_k$ is in the compact set $C = \{x : f(x) \leq f(x_0)\}$ for all $k \in \{0, 1, 2, ...\}$. Continuity of $f$ implies that, over this compact set $C$, there is a finite value $f_{min}$ such that $f(x_k)\geq f_{min}$ for all $k$ and so $f(x_k)$ converges to a finite value and $$ \lim_{k\rightarrow \infty} [f(x_{k+1})-f(x_k)] = 0 \quad (Eq. *)$$

Suppose $||\nabla f(x_k)||$ does not converge to $0$ (we reach a contradiction). Hence, there is an $\epsilon>0$ such that $||\nabla f(x_k)||\geq \epsilon$ for an infinite number of the $x_k$ vectors. These $x_k$ vectors are all contained in the compact set $C$, so by the Bolzano-Wierstrass theorem there is a subsequence $x_{k[i]}$ such that $||\nabla f(x_{k[i]})||\geq \epsilon$ for all $i \in \{1, 2, 3, ...\}$ and $\lim_{i\rightarrow \infty} x_{k[i]} = z$ for some point $z\in C$. By continuity we know $||{\nabla f(z)}||\geq \epsilon$. Since $f$ is differentiable at $z$ we have for all $\gamma>0$ $$ f(z-\gamma \nabla f(z)) = f(z) + \underbrace{(\nabla f(z))^T (-\gamma \nabla f(z))}_{-\gamma ||\nabla f(x)||^2} + g(\gamma) $$ where $g$ is a function that satisfies $\lim_{\gamma\rightarrow 0^+} g(\gamma)/\gamma=0$. So there is a $\gamma^*>0$ such that $g(\gamma^*)/\gamma^* \leq ||\nabla f(z)||^2/2$ and so $$ f(z - \gamma^*\nabla f(z)) \leq f(z) -\frac{\gamma^*}{2} ||\nabla f(z)||^2 \leq f(z) - \frac{\gamma^* \epsilon^2}{2} \quad (Eq. **) $$ Now by the line search we have for all $i$: \begin{align} f(x_{k[i]+1}) &\overset{(a)}{=}\min_{\gamma \geq 0} \left[f\left(x_{k[i]} - \gamma \nabla f(x_{k[i]})\right)\right] \\ &\overset{(b)}{\leq} f(x_{k[i]} - \gamma^*\nabla f(x_{k[i]})) \\ &= f(z- \gamma^*\nabla f(z)) + [f(x_{k[i]} - \gamma^*\nabla f(x_{k[i]})) - f(z- \gamma^*\nabla f(z))]\\ &\overset{(c)}{\leq} f(z) -\frac{\gamma^*\epsilon^2}{2} + [f(x_{k[i]} - \gamma^*\nabla f(x_{k[i]})) - f(z- \gamma^*\nabla f(z))] \end{align} where (a) holds by the line-search algorithm; (b) holds because $\gamma^*>0$; (c) holds by (Eq. **). Subtracting $f(x_{k[i]})$ from both sides of the above inequality gives $$ f(x_{k[i]+1})-f(x_{k[i]}) \leq -\frac{\gamma^*\epsilon^2}{2} + [f(z)-f(x_{k[i]})] + [f(x_{k[i]} - \gamma^*\nabla f(x_{k[i]})) - f(z- \gamma^*\nabla f(z))]$$ Taking a limit as $i\rightarrow \infty$ gives $$ 0 \leq -\frac{\gamma^*\epsilon^2}{2} $$ where the left-hand-side uses (Eq. *) and the right-hand-side uses continuity of $f$ and $\nabla f$ and the fact that $x_{k[i]}\rightarrow z$. This is a contradiction. $\Box$

Convergence to local minimum of gradient descent with exact line search for $C^1$ function

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions