Comparing the rate of convergence of steepest descent method

115 Views Asked by At

I am wondering if we can compare the convergence rate of optimization methods, measured using different error definition.

For example, from the book Optimization for Data Analytics, the authors derived the convergence rate of steepest descent method with constant stepsize $1/L$ on three different cases: 1. general L-smooth, 2. convex + L-smooth, 3. strongly convex with modulus $m$ + L-smooth.

We have, from (3.7), that an iteration $k$ will be found such that $\lVert \nabla f(x^{k}) \rVert \leq \epsilon$ for some $k \leq T$, where $$T \geq \frac{2L(f(x^0) - f^*)}{\epsilon^2}.$$ For the general convex case, we have from (3.8) that $f(x^k)-f^*\leq \epsilon$ when $$k \geq \frac{L\lVert x^0 - x^* \rVert^2}{2 \epsilon}$$ For the strongly convex case, we have from (3.15) that $f(x^k) - f^* \leq \epsilon$ for all k satisfying $$k \geq \frac{L}{M}\log((f(x^0)-f^*)/\epsilon)$$

The book said that the first two cases are sublinear, and the last case is linear. But the first case measured convergence error using the norm of the gradient($\lVert \nabla f(x^{k}) \rVert$), while the latter two cases measured error using the difference between the function value at time $k$ and the optimal function value($f(x^k) - f^*$). Can we still say that the steepest descent method converges faster on strongly convex function than on general L-smooth function in this case? Or am I missing something?