Background
I'm working through the "The constrained case" in section 3.2 of Bubeck's Convex Optimization: Algorithms and Complexity. The part I'm confused about is the proof of Theorem 3.7:
Let $f$ be convex and $\beta$-smooth on $\mathcal X$. Then projected gradient descent with $\eta = 1/\beta$ satisfies \begin{align*} f(x_t) - f(x^\star) \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{t} \end{align*}
The problem
The proof makes sense to me up until the point where induction is used as follows. Define $\delta_s = f(x_s) - f(x^\star)$. It states
...the two above displays will imply \begin{align} \delta_{t+1} \leq \delta_t - \frac{\delta_{t+1}^2}{2\beta \| x_1 - x^\star\|^2} \end{align} An easy induction shows that \begin{align*} \delta_s \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{s}. \end{align*}
How is induction used to obtain this result? It is stated as if it is simple, but I'm having trouble seeing it.
My attempt
I tried starting with the base case $\delta_1 \leq \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)$, clearly. From here, I assume the result holds for all $k \leq s$. Then I tried, \begin{align*} \delta_{s + 1} \leq \frac{3 \beta \|x_1 - x^\star\|^2 + f(x_1) - f(x^\star)}{s} - \frac{\delta_{s+1}^2}{2\beta \| x_1 - x^\star\|^2} \end{align*} but I'm confused on how to deal with the other $\delta_{s+1}^2$ term. The algebraic manipulations I tried were unsuccessful.