Lipschitz Smoothness and Nesterov's Accelerated Gradient Descend

366 Views Asked by At

I am reading about Nesterov's Accelerated Gradient Descend, and trying to apply it to my application (Computerised Tomography), which essentially means I am solving a linear problem of the form

$$ \hat{x}=\underset{x}{\text{argmin}}\left\{ \lVert Ax-b\rVert^2\right\} $$

Regardless of the complexity of the proof, ultimately the algorithm seems fairly easy to implement, as it is just a slight modification from the classic gradient descend with some specific parameter updates. Nesterov's acceleration ends in an update that looks like:

$$ y_{s+1}=x_{x}-\frac{1}{\beta}\nabla f(x_s)\\ x_{s+1}=(1-\gamma_s)y_{s+1}+\gamma_s y_s $$ for iteration number $s$ and $\gamma_s=\frac{1-\lambda_s}{\lambda_{s+1}}$, being $\lambda_S=\frac{1+\sqrt{1+4{\lambda_{s-1}^2}}}{2}$.

However it relies in the $\beta$-smoothness of the function $f$, and $\beta$ is used in as a parameter.

$f$ is $ \lVert Ax-b\rVert^2$

What value should I give to $\beta$? How can I know how $\beta$-smooth my function is?

1

There are 1 best solutions below

1
On BEST ANSWER

The gradient of $f$ is $\nabla f(x) = 2 A^T(Ax-b)$. To use Nesterov's method we need a Lipschitz constant for $\nabla f$ (not $f$). Notice that \begin{align} \| \nabla f(x)-\nabla f(y) \| &= \|2 A^T A(x-y) \| \\ &\leq 2 \| A^T A \| \|x-y\| \\ &= 2 \| A\|^2 \|x-y\|^2. \end{align} So a Lipschitz constant for $\nabla f$ is $$ \beta= 2 \|A\|^2. $$