Suppose we want to solve the following optimization problem: $$ \min_{x\in\mathcal{X}\subset\mathbb{R}^n} f(x) $$ where $\mathcal{X}$ is closed and convex and $f$ can be nonconvex but still smooth.
Projected gradient descent for solving this problem writes: $$ x_{t+1}\leftarrow \mathrm{proj}_{\mathcal{X}}(x_{t} - \eta_t \nabla f(x_t)) $$ where $\eta_t>0$ is the stepsize.
A commonly used convergence criterion (stopping criterion) when $f$ is nonconvex is $$ G_t:= \frac{1}{\eta_t}\{x_t - \mathrm{proj}_{\mathcal{X}}(x_{t} - \eta_t \nabla f(x_t))\} $$ which reduces to $\nabla f(x_t)$ if $\mathcal{X}=\mathbb{R}^n$. Usually we take $\eta_t$ to be a constant or a non-summable sequence, i.e. $\sum_{t=1}^{\infty}\eta_t=\infty$ (see for example Bertsekas Nonlinear programming sec 3.3).
My question is that, what if I take $\eta_t$ to be a summable sequence, say $\eta_t=1/t^2$? can we prove or disprove the claim that if $G_t\rightarrow 0$ in this case, the algorithm still converges to a stationary point? By stationary point I mean a point $x^*$ s.t. $$ \nabla f(x^*)^\top(x-x^*)\geq 0,\ \forall x\in\mathcal{X} $$
(Note that the above question also applies to the more general proximal gradient method, where we just need to repalce the projection operation $\mathrm{proj}_{\mathcal{X}}$ with the proximal operator $\mathrm{prox}$)
Well $\eta_t$ being summable has not much to do with the convergence criteria but rather the actual convergence. The stopping criterion that you have mentioned works well immaterial of $\eta_t$ being a constant or a non-summable or a summable sequence.
Assume a linear function $f(x) = x_1 +x_2$. The gradient is constant $[1, 1]$. A summable step size like $\eta_t=\frac{1}{t^2}$ would only let you move by a fixed distance even after infinite steps. $\sum_{t=1}^{\infty}\eta_t = \frac{\pi^2}{6}$. You can extend this same analogy to non-linear functions just ensure that $\eta_t$ dies much faster than gradient of the function.