Proof of convergence of projected gradient descent via induction

319 Views Asked by Bumbble Comm At 15 May 2026 - 1:12

Background

I'm working through the "The constrained case" in section 3.2 of Bubeck's Convex Optimization: Algorithms and Complexity. The part I'm confused about is the proof of Theorem 3.7:

Let $f$ be convex and $\beta$-smooth on $\mathcal X$. Then projected gradient descent with $\eta = 1/\beta$ satisfies \begin{align*} f(x_t) - f(x^\star) \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{t} \end{align*}

The problem

The proof makes sense to me up until the point where induction is used as follows. Define $\delta_s = f(x_s) - f(x^\star)$. It states

...the two above displays will imply \begin{align} \delta_{t+1} \leq \delta_t - \frac{\delta_{t+1}^2}{2\beta \| x_1 - x^\star\|^2} \end{align} An easy induction shows that \begin{align*} \delta_s \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{s}. \end{align*}

How is induction used to obtain this result? It is stated as if it is simple, but I'm having trouble seeing it.

My attempt

I tried starting with the base case $\delta_1 \leq \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)$, clearly. From here, I assume the result holds for all $k \leq s$. Then I tried, \begin{align*} \delta_{s + 1} \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{s} - \frac{\delta_{s+1}^2}{2\beta \| x_1 - x^\star\|^2} \end{align*} but I'm confused on how to deal with the other $\delta_{s+1}^2$ term. The algebraic manipulations I tried were unsuccessful.

Original Q&A

Proof of convergence of projected gradient descent via induction

Related Questions in OPTIMIZATION

Related Questions in INDUCTION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions