Gradient descent for differentiable convex functions

935 Views Asked by At

Suppose $f\colon\mathbb{R}^n\to\mathbb{R}$ is convex and differentiable, and assume that $f$ has a minimizer.

If $(x_k)$ is the sequence generated by exact gradient descent, must it converge to a minimizer?

Here "exact gradient descent" means that $x_{k+1} = x_k-t_k\nabla f(x_k)$ where it is assumed that $t_k$ is a minimizer of the function $t\mapsto f(x_k-t\nabla f(x_k))$ for $t\geq 0$ (the existence of $t_k$ is assumed for all $k$).

Reference or counterexample would be great. (I am aware of Wolfe's example demonstrating the importance of differentiability. I am also aware that this works when $f$ is strictly convex and coercive.)

2

There are 2 best solutions below

3
On

Here is something that can go wrong without strict convexity.

Define $f(x,y)=\max(0,|x|-1,|y|-1)^2.$ This isn't $C^1,$ but that can be fixed later. It attains the minimum value of $0$ in the square $|x|,|y|\leq 1.$ If we start at a point with $x,y>1$ and $x-1>2(y-1)$ then $f$ is locally equal to $(x-1)^2$ and has gradient $(2(x-1),0).$ The minimum of $y$ along the horizontal line of constant $y$ is $(y-1)^2,$ and we can adverserially pick the next point $(x',y')$ to have $x'$ slightly less than $-1,$ so $(-x'),y'>1$ with $y'-1>2((-x')-1).$ This is the same kind of inequality we started with for $(x,y)$ except rotated anti-clockwise by a right angle. Continuing in this way, we get a sequence of points whose limit set consists of the four corners $(\pm 1,\pm 1),$ and hence diverge by oscillation.

To fix the non-differentiability, in the region $x,y>1$ and $(x-1)/(y-1)\in (1/2,2),$ replace $f$ by the function that sends $(1+t(1+\cos\theta),1+t(1+\sin\theta))$ to $4t^2$; here $t>0$ and $0<\theta<\pi/2.$ Along $(x-1)=2(y-1)$ this equals $(x-1)^2$ with horizontal gradient as required. The other boundary $(y-1)=2(x-1)$ is similar, and the other corners can be handled in the same way so that the function is even in $x$ and $y.$ The derivatives at $(\pm 1,\pm 1)$ are still zero.

If the level sets are bounded, this should be the only thing that can go wrong - the sequence can diverge by oscillation, but all the limit points are minimizers.

7
On

We have assumed that $f : \mathbb{R}^N \rightarrow \mathbb{R}$ is convex, differentiable and has a minimiser. Let the line search function be denoted by, $$ \begin{align} g_k : \mathbb{R} \rightarrow& \mathbb{R} \\ t \mapsto& f(x_k - t \nabla f(x_k)) \end{align} $$

Lemma 1. If $\frac{dg_k}{dt}(0) = 0$ then $x_k$ is the minimum of $f$.

Proof. The derivative of $g_k$ is $\frac{dg_k}{dt} = \nabla f(x_k - t\nabla f(x_k)) \cdot \nabla f(x_k)$. Suppose that $\frac{dg_k}{dt}(0) = 0$, then $\nabla f(x_k)\cdot\nabla f(x_k) =0$, i.e. $\|\nabla f(x_k)\|^2 = 0$, and so $x_k$ is at the minimum. $\square$

The function $g_k$ is,

  • Convex as an affine transformation of $f$, which is convex.
  • Non constant. If it were then $\frac{dg_k}{dt} = 0$ for every $t \in \mathbb{R}$ and so $x_k$ is the minima by Lemma 1.

Lemma 2. If $x_{k+1} = x_k$ then $x_{k+1}$ is the minima.

Proof. The value $t_k$ is the minimiser of the function $g_k$. If $t_k = 0$ then, $\frac{dg}{dt}(0) = 0$, and so, by Lemma 1 we are at the minimum. Instead assume that $t_k \neq 0$. If $x_{k+1} = x_k$ then $t_k \nabla f(x_k) = 0$ and so $x_k$ is the minimum. $\square$

Lemma 3. The sequence defined by $z_k = f(x_k)$ is strictly decreasing unless $x_k$ is the global minima of $f$, in which case $x_n = x_k$, $\forall n\geq k$.

Proof. It suffices to prove that $g_k(t_k) < g_k(t)$ for every $t \in \mathbb{R}$. We know that $g_k(t_k) \leq g_k(t) $ by assumption. If $\exists t^* \neq t_k$ such that $g_k(t_k) = g_k(t^*) $, then $g_k$ is constant. By Lemma 1 it follows that $x_k$ is the minima. If it is the minima then $\nabla f(x_k) = 0$ and so $x_{k+1} = x_k$. $\square$

Theorem The sequence $z_k = f(x_k)$ converges to the minimum $x^*$.

Proof. The sequence $z_k$ is a real and strictly decreasing for all $z_k > f(x^*)$. It is bounded below by $f(x^*)$ and so it converges its infimum $ \ell = \inf_k{z_k} $ by the monotone convergence theorem. But $f(x_k) \to \ell$ which means that $\nabla f(x_{k}) \to 0$ as $k \to \infty$. $\nabla f(x_{k}) = 0$ only at the minimum of $f$, and so $\ell$ is the minimum.