Does gradient descent converge in convex optimization problems? If so, how?

2k Views Asked by At

Sorry in advance if this question sounds too broad or a little bit too obvious.

I know for sure that gradient descent, i.e., the update equation

$x_{k+1} = x_k - \epsilon_k \nabla f(x_k)$

converge to the unique minimizer of $f$ with domain $\text{dom}(f) = \mathbb{R}^n$ whenever $f$ is strictly or strongly convex.

However, I could not remember if it converges to a minimizer in convex functions, and how it achieves this convergence.

What is bothering me is that

  1. I've seen some conflicting results where instead of $x_k$, an averaged sequence $\hat x_{k} = \frac{1}{K} \sum_k x_k$ converges.

  2. I've also seen conflicting results where the step size is decreasing $o(1/k)$ vs it is constant.

  3. There is also the issue of weak vs strong convergence. I'm not sure what this means exactly.

I have know some results but they are for quadratic functions, not for convex functions in general.

Can someone chime in on what this basic result in optimization look like?

1

There are 1 best solutions below

6
On BEST ANSWER

In https://stanford.edu/~boyd/papers/pdf/monotone_primer.pdf, section 5, subsection "Gradient method" or "Forward step method" shows a proof for functions that are strongly convex with Lipschitz gradient, using constant step sizes.

If the function is convex with Lipschitz gradient then https://www.stat.cmu.edu/~ryantibs/convexopt-F13/scribes/lec6.pdf shows a proof of convergence with constant step sizes.

For strictly convex functions (not Lipschitz), gradient descent will not converge with constant step sizes. Try this very simple example, let $f(x) = x^{4}$. You will see that there is no constant step size for gradient descent that will converge to $0$ (for any initial condition). In this case, people use diminishing step sizes. I've never seen results about convergence of the average but it might be what happens when you use a constant step size on functions that are strictly convex and don't have Lipschitz gradient.

Unless $x$ lives in an infinite-dimensional space, weak and strong convergence is the same and I wouldn't worry about it. Here is additional reading if you are interesting, https://people.maths.bris.ac.uk/~mazag/fa17/lecture9.pdf