Compute the complexity of the gradient descent.

Question

Compute the complexity of the gradient descent.

139 Views Asked by Bumbble Comm At 30 Mar 2026 - 9:55

Let $f$ be a $L$-smooth and convex function. We denote $\|\cdot\|$ to be the $\ell_{2}$-norm on $\mathbb{R}^{n}$. The definition of $f$ being $L$-smooth is the following $$f(x)\leq f(y)+\langle \nabla f(y),x-y\rangle+\dfrac{L}{2}\|x-y\|^{2},\ \forall x,y$$ With this function, we take the step size $h:=\frac{1}{L}$ for each iteration in the gradient descent, i.e. $$x_{k+1}=x_{k}-\frac{1}{L}\nabla f(x_{k}).$$

I have been trying to prove the following lemma

Lemma: for all $k\geq 0$, we have $$f(x_{k})\leq f_{\ast}+\dfrac{2L\|x_{0}-x_{\ast}\|^{2}}{k+4}.$$

to prove the following fact:

$$\min_{0\leq i\leq K}\|\nabla f(x_{i})\|\leq \dfrac{4L\|x_{0}-x_{*}\|}{\sqrt{(K+4)(K+6)}}.$$

One more question, how do I use this inequality to compute the complexity? In other words, if I set $$\dfrac{4L\|x_{0}-x_{*}\|}{\sqrt{(K+4)(K+6)}}\leq\epsilon$$ we have $$\dfrac{16L^{2}\|x_{0}-x_{*}\|^{2}}{\epsilon^{2}}\leq (K+4)(K+6).$$ From here, what should I do to obtain $K=O(\text{some functions in}\ \epsilon)$?

Edit 1: Some more attempts

To make this post cleaner, I erased the previous attempts. I think that this attempt is the best I can get. I am not sure how to have a tighter upper bound.

Use the definition of $L$-smooth, applying to $y=x_{k+1}$ and $x=x_{k}$ and since $x_{k+1}-x_{k}=-\frac{1}{L}\nabla f(x_{k})$, we have $$f(x_{k+1})\leq f(x_{k})-\dfrac{1}{2L}\|\nabla f(x_{k})\|^{2},$$ rearranging the term, we have $$\dfrac{1}{2L}\|\nabla f(x_{k})\|^{2}\leq f(x_{k})-f(x_{k+1}).$$ Note that for each $k$, we have $f(x_{k+1})\geq f_{*}$, so $$\dfrac{1}{2L}\|\nabla f(x_{k})\|^{2}\leq f(x_{k})-f(x_{k+1})\leq f(x_{k})-f_{*}.$$

Now, consider the lemma $$f(x_{k})-f_{*}\leq \dfrac{2L\|x_{0}-x_{*}\|^{2}}{k+4}.$$ We sum both sides over all $k=0,\dots, K$, so that $$\dfrac{1}{2L}\sum_{k=0}^{K}\|\nabla f(x_{k})\|^{2}\leq \sum_{k=0}^{K}(f(x_{k})-f_{*})\leq 2L\|x_{0}-x_{*}\|^{2}\sum_{k=0}^{K}\dfrac{1}{k+4}.$$

For the LHS, we have $$(K+1)\dfrac{1}{2L}\min_{0\leq k\leq K}\|\nabla f(x_{k})\|^{2}\leq \dfrac{1}{2L}\sum_{k=0}^{K}\|\nabla f(x_{k})\|^{2}.$$ Hence, we have $$\min_{0\leq k\leq K}\|\nabla f(x_{k})\|^{2}\leq 4L^{2}\dfrac{1}{K+1}\|x_{0}-x_{*}\|^{2}\sum_{k=0}^{K}\dfrac{1}{k+4}.$$

Here then comes the hard part. I am not sure how to increase the bound on the RHS. In fact, I do not think that I can. To see this, to obtain the upper bound $\frac{16L^{2}\|x_{0}-x_{*}\|^{2}}{(K+4)(K+6)}$, we basically need to prove that $$\dfrac{1}{K+1}\sum_{k=0}^{K}\dfrac{1}{k+4}\leq \dfrac{4}{(K+4)(K+6)},\forall\ K\geq 0.$$

This does not even hold for $K=0$. And by using Wolframalpha, we can see that the solution area actually lies in a subset of $K<0$. This implies that I enlarged the RHS too much. But I do not know what I should do instead. Any idea?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is a partial answer only, it responds to proving the lemma and the complexity question at the end. It also improves slightly the bound you proved without reaching your goal. You may want to specify why you believe that bound is correct in the first place, it could help people prove it.

A very nice proof of your Lemma is present in here. I find that it is a very good resource. Observe that their definition of smoothness is slightly different to yours but theirs implies yours in Lemma 1, so we are fine. Also note that they have a $k+3$ in the denominator since they go from $1$ to $k$ and not from $0$ to $K$ as in your case, but it is the same Lemma.

In your proof, instead of summing the equation $\frac{1}{2L} \| \nabla f(x_k) \|^2\leq \frac{2L \| x_0-x^\ast\|^2}{k+4}$, you should take the minimum on both sides to get \begin{align*} \min_{1\leq k \leq K} \| \nabla f(x_k) \| \leq \min_{1\leq k \leq K} \frac{2L \| x_0-x^\ast\|}{\sqrt{k+4}}&=\frac{2L \| x_0-x^\ast\|}{\sqrt{K+4}} \end{align*}

which is asymptotically tighter than what you had, indeed $\sum_{k=0}^K\frac{1}{k+4}=O(\ln(K))$.

For you last question, observe that $(K+4)(K+6)=K^2+10K + 24=(K+5)^2-1$ and therefore $(K+5)^2 = O\left( \frac{1}{\varepsilon^2} \right)$, i.e. taking the square root and removing the $+5$ we get $K=O\left( \frac{1}{\varepsilon} \right)$.

Compute the complexity of the gradient descent.

There are 1 best solutions below

Related Questions in ANALYSIS

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions