This paper is by a famous professor. It has several hundreds of citations. There must be someone who understands this paper. If someone out there is experienced in optimization, please have a look at this.
In Coordinate Descent Algorithms
https://link.springer.com/article/10.1007%2Fs10107-015-0892-3
by Stephen J. Wright an assumption (page 14) is given by:
Assumption 1 The function $f$ in $\text{ min } f(x) $ is convex and uniformly Lipschitz continuously differentiable, and attains its minimum value $f^{*}$ on a set $S$. There is a finite $R_{0}$ such that the level set for f defined by $x_0$ is bounded, that is, $$ \max _{x^{*} \in \mathcal{S}} \max _{x}\left\{\left\|x-x^{*}\right\|: f(x) \leq f\left(x^{0}\right)\right\} \leq R_{0} $$
Then in the part dealing with accelerated randomized coordinate descent (page 19) it is stated:
Theorem 2: *Suppose that Assumption 1 holds, and define $$ S_{0}:=\sup _{x^{*} \in \mathcal{S}} L_{\max }\left\|x^{0}-x^{*}\right\|^{2}+\left(f\left(x^{0}\right)-f^{*}\right) / n^{2} $$ Then for all $k\ge0$ we have
\begin{aligned} E&(f(x^k)) - f^* \nonumber \\&\le S_0 \frac{\sigma }{L_\mathrm{max}} \left[ \left( 1+ \frac{\sqrt{\sigma /L_\mathrm{max}}}{2n} \right) ^{k+1} - \left( 1- \frac{\sqrt{\sigma /L_\mathrm{max}}}{2n} \right) ^{k+1} \right] ^{-2} \\ &\le S_0 \left( \frac{n}{k+1} \right) ^2. \end{aligned}
where $\sigma$ is the modulus of strong convexity and $L_{\text{max}}$ is the coordinate Lipschitz constant.
Then he concludes the following: The term $$ \left(1+\frac{\sqrt{\sigma / L_{\max }}}{2 n}\right)^{k+1} $$ eventually dominates the second term $$ \left(1-\frac{\sqrt{\sigma / L_{\max }}}{2 n}\right)^{k+1} $$ so that the linear convergence rate suggested by this expression is significantly faster than the corresponding rate $$ E\left[f\left(x^{k}\right)\right]-f^{*} \leq\left(1-\frac{\sigma}{n L_{\max }}\right)^{k}\left(f\left(x^{0}\right)-f^{*}\right) \quad \forall k \geq 1 $$ for Algorithm 3 (which is the randomized coordinate descent without acceleration).
Finally my problem: I just cant follow his logic and cant see why this expression is significantly faster than the other one.
I am very thankful for any hint.
To see why the convergence rate is faster, you can compare two quantities: $1-\frac{\sigma}{n L_{\max}}$ and $\Bigl[ \bigl(1+\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^{k+1} - \bigl(1-\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^{k+1}\Bigr]^{-2}$. When $k$ becomes larger, the later one can be represented as
$$ \frac{1}{\biggl( \bigl(1+\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^{k+1} - \bigl(1-\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^{k+1} \biggr)^2} \approx \frac{1}{\biggl( \bigl(1+\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^{k+1} \biggr)^2} = \Biggl( \frac{1}{\bigl(1+\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^2} \Biggr)^{k+1}. $$
Now using first-order Taylor expansion we have $\frac{1}{(1+x)^2} \approx 1-2x$. Thus the rate can be further approximated by
$$ \frac{1}{\bigl(1+\frac{\sqrt{\sigma/L_{\max}}}{2n}\bigr)^2} \approx 1 - \frac{\sqrt{\sigma/L_{\max}}}{n} < 1 - \frac{\sigma/L_{max}}{n}. $$
The improvement by a factor of $\sqrt{L_{\max}/\sigma}$ in the number of iterations can be seen by the following formula:
$$ (1-\rho)^k < \epsilon \Rightarrow k > \frac{\log (1/\epsilon)}{\log(1/(1-\rho))} \approx \frac{1}{\rho} \log (1/\epsilon) , $$
where you substitute $\rho = \frac{\sigma/L_{max}}{n}$ and $\rho = \frac{\sqrt{\sigma/L_{\max}}}{n}$ respectively.