Proof or counterexample for the convergence of projected gradient descent with summable stepsizes

78 Views Asked by Bumbble Comm At 28 Mar 2026 - 2:07

Suppose we want to solve the following optimization problem: $$ \min_{x\in\mathcal{X}\subset\mathbb{R}^n} f(x) $$ where $\mathcal{X}$ is closed and convex and $f$ can be nonconvex but still smooth.

Projected gradient descent for solving this problem writes: $$ x_{t+1}\leftarrow \mathrm{proj}_{\mathcal{X}}(x_{t} - \eta_t \nabla f(x_t)) $$ where $\eta_t>0$ is the stepsize.

A commonly used convergence criterion (stopping criterion) when $f$ is nonconvex is $$ G_t:= \frac{1}{\eta_t}\{x_t - \mathrm{proj}_{\mathcal{X}}(x_{t} - \eta_t \nabla f(x_t))\} $$ which reduces to $\nabla f(x_t)$ if $\mathcal{X}=\mathbb{R}^n$. Usually we take $\eta_t$ to be a constant or a non-summable sequence, i.e. $\sum_{t=1}^{\infty}\eta_t=\infty$ (see for example Bertsekas Nonlinear programming sec 3.3).

My question is that, what if I take $\eta_t$ to be a summable sequence, say $\eta_t=1/t^2$? can we prove or disprove the claim that if $G_t\rightarrow 0$ in this case, the algorithm still converges to a stationary point? By stationary point I mean a point $x^*$ s.t. $$ \nabla f(x^*)^\top(x-x^*)\geq 0,\ \forall x\in\mathcal{X} $$

(Note that the above question also applies to the more general proximal gradient method, where we just need to repalce the projection operation $\mathrm{proj}_{\mathcal{X}}$ with the proximal operator $\mathrm{prox}$)

Original Q&A

There are 2 best solutions below

Bumbble Comm On 05 Jan 2024 - 8:31

Well $\eta_t$ being summable has not much to do with the convergence criteria but rather the actual convergence. The stopping criterion that you have mentioned works well immaterial of $\eta_t$ being a constant or a non-summable or a summable sequence.

Assume a linear function $f(x) = x_1 +x_2$. The gradient is constant $[1, 1]$. A summable step size like $\eta_t=\frac{1}{t^2}$ would only let you move by a fixed distance even after infinite steps. $\sum_{t=1}^{\infty}\eta_t = \frac{\pi^2}{6}$. You can extend this same analogy to non-linear functions just ensure that $\eta_t$ dies much faster than gradient of the function.

Bumbble Comm On 12 Feb 2024 - 2:16

To answer my question directly: By Lemma 2.3.1 in Bertsekas Nonlinear programming 1st ed, $G_t$ is a non-increasing function w.r.t. $\eta_t$. Thus if we can show that $G_t\rightarrow 0$ with $\eta_t=1/t^2$, then we are indeed still converging to a first-order stationary point.

Shiv's answer is also correct in answering the other aspect of this question: taking $\eta_t=1/t^2$ will generally not leading to a convergence, since our stepsizes are summable.

Proof or counterexample for the convergence of projected gradient descent with summable stepsizes

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in REFERENCE-REQUEST

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in CONVEX-GEOMETRY

Related Questions in NON-CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions