The proof for worst-case convergence rate ($\frac{1}{\sqrt{T}}$) of non-smooth convex optimization

38 Views Asked by Bumbble Comm At 15 Apr 2026 - 10:24

In Introductory Lectures on Convex Optimization by Nesterov, Section 3.2.1, they construct a difficult function to show that all the schemes work badly on this function. Say, consider an unconstrained minimization problem $$\min_{x\in\mathbb R^n}f(x),$$ where $f(x)$ is convex and Lipschitz continuous on a bounded set. At each point $\hat x$, we can compute a subgradient of $f$ at $\hat x$, denoted as $g(\hat x)$ (if there are several subgradients, then arbitrarily choose one). The method is to generate a sequence $\{ x_k \}$, where $x_k \in x_0 + span\{g(x_0),\cdots, g(x_{k-1})\}$, such that $x_k\to x^*$ the optimal point of $f(x)$. Their construction is $$f_k(x) = \gamma\max_{1\leq i\leq k} x^{(i)} + \frac{\mu}{2}||x||^2, \quad k=1,\cdots,n.$$ And they finally proved that $f(x_k)-f^*\geq \frac{MR}{2(1+\sqrt{k+1})}$ with $M$ being the Lipschitz constant and $R$ being the region's radius.

My question is, why do they restrict $k$ to be no more than $n$. What if I want to prove a similar result for $k>n$?

Original Q&A

The proof for worst-case convergence rate ($\frac{1}{\sqrt{T}}$) of non-smooth convex optimization

Related Questions in CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions