Sub-linear convergence rate in convex optimization

708 Views Asked by At

I learned the definition of sublinear rate of convergence from wikipedia, but I am not quite sure how to use it in practice. Specifically, I am studying the proof for linear and sub-linear rate of convergence for gradient descent algorithm under different assumptions, here I called them $\textbf {case 1}$ and $\textbf {case 2}$.

If $f^*$ is the optimal solution(minim cost) and $f(x_k)$ is cost at iteration $k$, then $\Delta_k$ is defined as $\Delta_k = f(x_k) - f^*$.

$\textbf {case 1}$: When we have $\Delta_{k+1} \leq (1-\frac{\mu}{L})\Delta_{k}$ in which $\mu , L \in \mathbb{R}$ and $\mu < L$, $\rightarrow$ it shows the linear rate of convergence for function $f$. ( I understood this part).

$\textbf{case 2}:$ Under different assumption, we will have the following recursion:

$\Delta_{k+1} \leq \Delta_k − C\Delta_k^2$, where $C = \frac{1}{2L\|x_0−x^* \|}$

It follows:

$\frac{1}{\Delta_{k}} \leq \frac{1}{\Delta_{k+1}} − C\frac{\Delta_k}{\Delta_{k+1}} $

$ \frac{1}{\Delta_{k+1}} \geq \frac{1}{\Delta_{k}}+c \geq \quad... \quad \geq \frac{1}{\Delta_{0}}+(k+1)c$

which shows $\Delta_k \leq O(\frac{1}{k})$ and has a sublinear convergence.

I don't know how exactly relate the definition of sublinear rate of convergence from wikipedia to the result obtained in case 2. Can anyone help to show how we can conclude the convergence rate based on above recursion(case 2)?

Thank you.

1

There are 1 best solutions below

2
On

I think the lecture notes are somewhat carelessly worded. At the top of page 5 it says that if assumption 1 is satisfied you have sublinear convergence but if assumptions 1 and 2 are satisfied you have linear convergence. This seems to imply the convergence can be both sublinear and linear, which contradicts the wikipedia discussion cited by OP.

I think the resolution is that the lecture notes author should have written that the best you can conclude from assumption 1 alone is that the convergence is sublinear and if both assumptions are satisfied you can draw the stronger conclusion that the convergence is linear.

In particular, suppose the sequence $\Delta_k\sim c/k$, converging to zero about as slowly as case 2 allows. Then the ratio $\mu_k=\Delta_{k+1}/\Delta_k$ tends to $\mu=1$ and hence $\Delta_k$ is sublinear in the sense of the Wikipedia article. (Reading between the W article's lines: under basic definition it does not define $\mu_k$,...).