lower complexity bounds for problems with smooth functions (Nesterov's Lectures 2004)

217 Views Asked by At

This comes from p.12 Nesterov's book, Introductory Lectures on Convex Optimization:

We should note, that the lower complexity bounds for problems with smooth functions, or for high-order methods are not much better than those of Theorem 1.1.2. This can be proved using the same arguments and we leave the proof as an exercise for the reader.

Theorem 1.1.2 states

For $\epsilon < \frac{1}{2} L$, the analytical complexity of $\mathcal{C}$ for zero-order methods is ($\floor{\frac{L}{2\epsilon}})^n$ where $\mathcal{C}$ is an $\epsilon$-accuracy minimization problem over a box with a zero-order oracle.

I find this statement very confusing since isn't the point of all of this work to get better bounds for optimization problems? Shouldn't higher-order methods (I assume this means higher-order oracles) be able to get better lower bounds?

Any pointers would be helpful. It would be great if someone could state what the exercise is in the form of a theorem statement or the like.

Many thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

The proof of theorem 1.1.2 is based on the idea that whatever algorithm you use to pick your points, you can run the algorithm for $p^{n}$ steps returning $f(x)=0$ at each test point. After you've done this, there will still be an unexplored part of the solution space large enough that you can define $f$ inside this region with a minimum function value of $-\epsilon$.

At a technical level, the Lipschitz constant is important because it controls how much less than 0 the function value can be depending on how close the minimum is to one of the test points with $f(x)=0$.

What if $f$ is smooth? There are so called $C^{\infty}$ "bumps" that are smooth and have 0 values for f and all of its derivatives outside of a small region. You can use such a bump function with the same argument used to prove theorem 1.1.2.

What if a higher order method is used? Remember that the bump function has $f$ and all of its derivatives equal to 0 at each of the test points- there's absolutely no information available to your algorithm to tell where the bump is hidden!

You're ultimately going to need other properties of the function beyond smoothness and Lipschitz continuity to get better results. As you'll see if you read further in the book, you can do better for example if $f$ is convex and has a Lipschitz continuous gradient.