Suppose $f: \mathbb R^n \to \mathbb R$ is a $C^1$ convex function with gradient being Lipschitz continuous, i.e., $\|\nabla f(x) - \nabla f(y)\|_2 \le L\|x-y\|_2$. Consider the gradient update scheme $$x_{k+1} = x_k - \alpha_k \nabla f(x_k),$$ where $\alpha_k = \alpha_k(\nabla f(x_k))$ means $\alpha_k$ is a function of $\nabla f(x_k)$ and $\alpha_k \in o(\nabla f(x_k)))$, i.e., $\alpha_k \to 0$ if $\nabla f(x_k) \to 0$. It might make no sense to consider vanishing stepsize but think this as a thought experiment.
I am wondering whether the sequence $\{f(x_k)\}$ will converge to $f(x_*)$. At first, I thought this was obvious, but then it occurred to me the reduction $f(x_k) - f(x_{k-1})$ could be arbitrarily small. It becomes less obvious for me.
As a start, consider that at each iteration, we have the following inequality:
$$ \begin{align} \|x^{(k+1)} - x^*\|_2^2 &= \|x^{(k)} - \alpha_k \nabla f(x^{(x)}) - x^*\|_2^2 \\ &= \|x^{(k)} - x^*\|_2^2 + \alpha_k^2 \|\nabla f(x^{(x)})\|_2^2 - 2\alpha_k \nabla f(x^{(x)})^T (x^{(k)} - x^*) \\ &\leq \|x^{(k)} - x^*\|_2^2 + \alpha_k^2 \|\nabla f(x^{(x)})\|_2^2 - 2\alpha_k (f(x^{(k)}) - f(x^*)) \end{align} $$
We can rearrange and build this up inductively for $k = 1,\ldots, K$ so that $$ 2\sum_{k=0}^{K-1} \alpha_k (f(x^{(k)} - f(x^*)) \leq \|x^{(0)} - x^*\|_2^2 + \sum_{k=0}^{K-1} \alpha_k^2 \|\nabla f(x^{(k)}\|_2^2 $$ and $$ f(x^{(\hat{k})}) - f(x^*) \leq \frac{\|x^{(0)} - x^*\|_2^2}{2\sum_{k=0}^{K-1} \alpha_k} + \frac{L^2 \sum_{k=0}^{K-1} \alpha_k^2}{2\sum_{k=0}^{K-1} \alpha_k} $$ where $x^{(\hat{k})}$ is the argminimizer of $f$ over all the iterates up through iteration $K$. So one thought would be that we need $\sum_{k=0}^{K-1} \alpha_k = \infty$ and also that $\sum_{k=0}^{K-1} \alpha_k^2 < \infty$.