Gradient descent: step size for a $C^{\infty}$ coercive function

455 Views Asked by At

Let $f: U \to \mathbb R$ be a $C^{\infty}$ function where $U$ is an open connected subset of $\mathbb R^n$. $f$ is coercive, i.e., $f(x) \to +\infty$ as $\|x\| \to \partial U$. This is equivalent to the compactness of sublevel set $\{x \in U: f(x) \le \alpha\}$.

We would like to use gradient descent to optimize this function. Suppose we don't know how to estimate the global Lipschitz property of the gradient (it's also possible there is no such constant globally). Now choose some initial condition $x_0$. The set $K = \{x \in U: f(x) \le f(x_0)\}$ is compact. Suppose we have the ability to bound the norm of Hessian over $K$, i.e., \begin{align*} c = \max_{x \in K} \{\|\nabla^2f(x)\|\}, \end{align*} where $\|\nabla^2 f(x)\| = \sup_{\|y\|_2 =1} \langle \nabla^2 f(x) y, y \rangle$. I am wondering whether the gradient descent scheme with step size $1/c$ will converge. The iterates are generated by following rule \begin{align*} x_{k+1} = x_k - \frac 1 c \nabla f(x_k). \end{align*} If we let $h := f|_K: K \to \mathbb R$ be the restriction of $f$, then above scheme is gradient descent for a function with $c$-Lipschitz continuous gradient. But there is a problem for me in applying the standard analysis for this function class. In a standard analysis, we have \begin{align*} f(x_k - \frac 1 c \nabla f(x_k) ) = f(x_k) - \frac {1} {c} \|\nabla f(x_k)\|^2 - \frac{1} {c} \int_{0}^{1} \langle \nabla f(x_k - t \frac{1}{c} \nabla f(x_k) ) - \nabla f(x_k), \nabla f(x_k) \rangle dt. \end{align*} Ideally we would like then apply mean value theorem for $\nabla f(x_k - t \frac{1}{c} \nabla f(x_k) ) - \nabla f(x_k)$, but how do we know the "mean value" lies on the compact set $K$.

2

There are 2 best solutions below

0
On

Since $K$ is compact and $f$ is $C^2$, then $\nabla f$ is Lipschitz in $K$, let's call such Lipschitz constant $L_K$. You can then start with some estimate $c$ for $L_K$, and increase it in case (i) your step falls out of $K$, that is if $f(x_{k+1}) > f(x_0)$, or (ii) $x_{k+1}$ and $x_k$ don't satisfy the quadratic upper bound

$$ f(x_{k+1}) \leq f(x_k) + \langle\nabla f(x_k), x_{k+1} - x_k \rangle + \frac{c}{2}\|x_{k+1} - x_k\|^2, $$

since if $c \geq L_K$ then the above bound holds between any pair of points in $K$.

0
On

I answered this question here. You can indeed prove that that $f(x-\alpha \nabla f(x)) \leq f(x)$ for any $\alpha < 2/c$, so this holds in particular for $\alpha = 1/c$. Here is a copy of what I wrote:$\newcommand{\T}{x}\newcommand{\al}{\alpha}\newcommand{\bal}{\bar{\alpha}}$

Define $U_\al = \{ \T-t\al\nabla f(\T) \mid t \in [0,1], \T\in U_0 \}$ and the continuous function $L(\al) = \sup_{\T \in U_\al} \lVert{\nabla^2 f(\T)}\rVert$. Notice that $U_0 \subset U_{\al}$ for all $\al < \al'$. We prove that $\al L(\al) < 2$ implies $U_\al = U_0$ and in particular, $L(\al) = L(0) = L$. By Taylor expansion,

$$f(\T-t\al\nabla f) = f(\T) - \al \lVert{\nabla f(\T)}\rVert^2 + \frac{t^2\al^2}{2}\nabla f(\T)^T\nabla^2 f(\T-t'\al\nabla f)f(\T) $$

for some $t' \in [0,t] \subset [0,1]$. Since $\T-t'\al\nabla f \in U_\al$, it follows that

$$ f(\T-t\al\nabla f) \leq f(\T) -\al\lVert{\nabla f(\T)}\rVert^2(1-\al L(\al)/2) \leq f(\T) $$

for all $\al L(\al) < 2$. In particular, $\T-t\al\nabla f \in U_0$ and hence $U_\al = U_0$. We conclude that $\al L(\al) < 2$ implies $L(\al)=L$, implying in turn $\al L < 2$. We now claim the converse, namely that $\al L < 2$ implies $\al L(\al) < 2$. For contradiction, assume otherwise that there exists $\al' L < 2$ with $\al'L(\al') \geq 2$. Since $\al L(\al)$ is continuous and $0 L(0) = 0 < 2$, there exists $\bal \leq \al'$ such that $\bal L < 2$ and $\bal L(\bal) = 2$. This is in contradiction with continuity:

$$ 2 = \bal L(\bal) = \lim_{\al\to\bal^-} \al L(\al) = \lim_{\al\to\bal^-} \al L = \bal L \,. $$

Finally we conclude that $U_\al = U_0$ for all $\al L < 2$. In particular, $\T_0 \in U_0$ implies $\T_k \in U_0$ by induction.