Bounding below the decrease obtained with gradient descent

169 Views Asked by At

I'm reading Nesterov's book Introductory Lectures on Convex Optimization. On p. 22 he proves that if $f:\mathbb{R}^n \to \mathbb{R}$ is continuously differentiable, and $\nabla f$ has Lipshitz constant $L$, then

$$|f(y) - f(x) - \langle \nabla f(x), y - x\rangle | \leq \frac{L}{2}\|y - x\|^2$$

This inequality is marked with the tag (1.2.5).

Then on page 26 he is considering the gradient method for unconstrained minimization of $f$. The iteration algorithm is that

$$x_{k+1} \leftarrow x_k - h \cdot \nabla f(x)$$

for some step size $h \in \mathbb{R}^{>0}$. He writes:

enter image description here

A couple of questions:

  1. How does he get to the step $f(y) \leq f(x) + \langle f'(x), y-x \rangle + \frac{L}{2}\|y-x\|^2$? How does he get rid of the absolute value in the previous inequality (1.2.5)?
  2. He writes "one step of the gradient method decreases the value...at least...$\frac{1}{2L}\|f'(x)\|^2$. I suppose he means if we always choose the value of $h$ that makes our upper bound sharpest, right? But this is not the case for all our algorithms. (He covers various ways to choose step size next, and of course they don't all choose $h=\frac{1}{L}$.) So it's not really true to say that the gradient method always decreases it by this much, is it?
1

There are 1 best solutions below

1
On BEST ANSWER
  1. It's a general fact that if

$| z | \leq M$

then

$z \leq M$

Thus

$f(y)-f(x)-\nabla f(x)^{T}(y-x) \leq \frac{L}{2} \| y- x \|^{2}$

and

$f(y) \leq f(x) + \nabla f(x)^{T}(y-x) + \frac{L}{2} \| y-x \|^{2}$.

  1. Yes, you understand this correctly- you get the reduction of $\frac{1}{2L}\| \nabla f(x) \|^{2}$ if you pick $h=1/L$. Other choices of $L$ such as the Goldstein-Armijo rule might not do as well.