Convergence of gradient descent without global Lipschitz gradient assumption

Question

Convergence of gradient descent without global Lipschitz gradient assumption

619 Views Asked by Bumbble Comm At 27 Mar 2026 - 3:48

My question is very similar to this one and this one, but they haven't been answered.

Let $f \in C^2(\mathbb{R}^d, \mathbb{R})$ have compact sublevel sets and isolated critical points, and consider the gradient descent update $$ x_{k+1} = x_k-\alpha\nabla f(x_k) $$ for some fixed initial point $x_0$ and learning rate $\alpha$. If $f$ has $L$-Lipschitz gradient globally, it is known that $x_k$ converges to a critical point of $f$ for any $0 < \alpha < 2/L$. Now assume we drop the Lipschitz assumption. The set $U_0 = \{ f(x) \leq f(x_0) \}$ is compact and $\nabla f \in C^1$, so we can define $L = \sup_{x \in U} \lVert \nabla^2 f(x) \rVert < \infty$ (in $L^2$ norm).

I would like to prove (or disprove) that $x_k \in U_0$ for all $k$ for all $0 < \alpha < 2/L$. This would imply that $x_k$ converges to a critical point since $f|_U$ is $L$-Lipschitz. The idea would be to prove $f(x_{k+1}) \leq f(x_k)$ and conclude by induction, by Taylor expanding \begin{align*} f(x_{k+1}) &= f(x_k-\alpha \nabla f(x_k)) \\ &= f(x_k) - \alpha \lVert \nabla f(x_k) \rVert^2 + \frac{\alpha^2}{2}\nabla f(x_k)^T\nabla^2 f(x_k-t\alpha\nabla f(x_k))f(x_k) \end{align*} for some $t \in (0, 1)$. Now if we assume $(x_k-t\alpha\nabla f(x_k)) \in U$, we can conclude $$f(x_{k+1}) \leq f(x_k) - \alpha \lVert \nabla f(x_k) \rVert^2\left(1-\frac{\alpha L}{2}\right) \leq f(x_k)$$ for $\alpha < 2/L$, but this is (almost) a circular assumption... Any ideas?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This holds: here is a proof.$\newcommand{\T}{x}\newcommand{\al}{\alpha}\newcommand{\bal}{\bar{\alpha}}$

Define $U_\al = \{ \T-t\al\nabla f(\T) \mid t \in [0,1], \T\in U_0 \}$ and the continuous function $L(\al) = \sup_{\T \in U_\al} \lVert{\nabla^2 f(\T)}\rVert$. Notice that $U_0 \subset U_{\al}$ for all $\al < \al'$. We prove that $\al L(\al) < 2$ implies $U_\al = U_0$ and in particular, $L(\al) = L(0) = L$. By Taylor expansion,

$$f(\T-t\al\nabla f) = f(\T) - \al \lVert{\nabla f(\T)}\rVert^2 + \frac{t^2\al^2}{2}\nabla f(\T)^T\nabla^2 f(\T-t'\al\nabla f)f(\T) $$

for some $t' \in [0,t] \subset [0,1]$. Since $\T-t'\al\nabla f \in U_\al$, it follows that

$$ f(\T-t\al\nabla f) \leq f(\T) -\al\lVert{\nabla f(\T)}\rVert^2(1-\al L(\al)/2) \leq f(\T) $$

for all $\al L(\al) < 2$. In particular, $\T-t\al\nabla f \in U_0$ and hence $U_\al = U_0$. We conclude that $\al L(\al) < 2$ implies $L(\al)=L$, implying in turn $\al L < 2$. We now claim the converse, namely that $\al L < 2$ implies $\al L(\al) < 2$. For contradiction, assume otherwise that there exists $\al' L < 2$ with $\al'L(\al') \geq 2$. Since $\al L(\al)$ is continuous and $0 L(0) = 0 < 2$, there exists $\bal \leq \al'$ such that $\bal L < 2$ and $\bal L(\bal) = 2$. This is in contradiction with continuity:

$$ 2 = \bal L(\bal) = \lim_{\al\to\bal^-} \al L(\al) = \lim_{\al\to\bal^-} \al L = \bal L \,. $$

Finally we conclude that $U_\al = U_0$ for all $\al L < 2$. In particular, $\T_0 \in U_0$ implies $\T_k \in U_0$ by induction.

Convergence of gradient descent without global Lipschitz gradient assumption

There are 1 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in OPTIMIZATION

Related Questions in TAYLOR-EXPANSION

Related Questions in LIPSCHITZ-FUNCTIONS

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions