Gradient Descent for analytic function on a compact set

457 Views Asked by At

Suppose $f: K \to \mathbb R$ is an analytic function where $K \subset \mathbb R^n$ is a compact subset. Let us assume $f$ is not constant and $f$ achieves minimum at $\text{int}(K)$. Let $\beta = \max_{x \in K} \|\nabla^2 f(x)\|_2$, which is to say the gradient mapping is Lipschitz $\|\nabla f(x) - \nabla f(y)\|_2 \le \beta\|x-y\|_2$. Now let us consider an iterative gradient descent scheme with initial point $x_0 \in \text{int}(K)$ \begin{align*} x_{k+1} = x_k - \frac 1 \beta \nabla f(x_k). \end{align*} Then it is not hard to show the sequence $\{f(x_k)\}_{k=0}^{\infty}$ is monotonically decreasing and thus converges to some limit $l \in \mathbb R$.

I am wondering whether the sequence of iterates $\{x_k\}$ is convergent. The only possibility I can think of is that the sequence oscillates between two points $x_*^1, x_*^2$ with $f(x_*^1) = f(x_*^2) = l$. But I could not imagine how could this happen?

There is discussion here in which $f$ is convex and in the answer, an example was constructed, i.e., nonconvergent $\{x_k\}$, but the function constructed is not analytic.

2

There are 2 best solutions below

0
On

If $x,y$ were two points such that $y = x - \frac{1}{\beta} \nabla f(x)$ and $x = y - \frac{1}{\beta} \nabla f(y)$ then $\nabla f(x) - \nabla f(y) = 2\beta(y-x)$ which is in contradiction to the condition on $\beta$. So at least the "bad" sequence cannot have only two points.

However, this technique does not rule the possibility that there are three points that we may cycle between.

0
On

This paper proves that gradient descent converges to a critical point if the function is analytic, provided the Armijo condition $$ f(x_k-\alpha\nabla f(x_k)) \leq f(x_k)-c\alpha\lVert \nabla f(x_k) \rVert $$ holds at each step $k$. In particular, starting with $x_0$ inside a compact set $K$, you can prove that $x_k \in K$ for all $k$ provided $\alpha < 2/L$ with $L = \sup_{x\in K} \lVert \nabla^2 f(x)\rVert$. This holds in particular for $\alpha = 1/K$ as you asked. I wrote the proof for this here, copied below.$\newcommand{\T}{x}\newcommand{\al}{\alpha}\newcommand{\bal}{\bar{\alpha}}$

Define $U_\al = \{ \T-t\al\nabla f(\T) \mid t \in [0,1], \T\in U_0 \}$ and the continuous function $L(\al) = \sup_{\T \in U_\al} \lVert{\nabla^2 f(\T)}\rVert$. Notice that $U_0 \subset U_{\al}$ for all $\al < \al'$. We prove that $\al L(\al) < 2$ implies $U_\al = U_0$ and in particular, $L(\al) = L(0) = L$. By Taylor expansion,

$$f(\T-t\al\nabla f) = f(\T) - \al \lVert{\nabla f(\T)}\rVert^2 + \frac{t^2\al^2}{2}\nabla f(\T)^T\nabla^2 f(\T-t'\al\nabla f)f(\T) $$

for some $t' \in [0,t] \subset [0,1]$. Since $\T-t'\al\nabla f \in U_\al$, it follows that

$$ f(\T-t\al\nabla f) \leq f(\T) -\al\lVert{\nabla f(\T)}\rVert^2(1-\al L(\al)/2) \leq f(\T) $$

for all $\al L(\al) < 2$. In particular, $\T-t\al\nabla f \in U_0$ and hence $U_\al = U_0$. We conclude that $\al L(\al) < 2$ implies $L(\al)=L$, implying in turn $\al L < 2$. We now claim the converse, namely that $\al L < 2$ implies $\al L(\al) < 2$. For contradiction, assume otherwise that there exists $\al' L < 2$ with $\al'L(\al') \geq 2$. Since $\al L(\al)$ is continuous and $0 L(0) = 0 < 2$, there exists $\bal \leq \al'$ such that $\bal L < 2$ and $\bal L(\bal) = 2$. This is in contradiction with continuity:

$$ 2 = \bal L(\bal) = \lim_{\al\to\bal^-} \al L(\al) = \lim_{\al\to\bal^-} \al L = \bal L \,. $$

Finally we conclude that $U_\al = U_0$ for all $\al L < 2$. In particular, $\T_0 \in U_0$ implies $\T_k \in U_0$ by induction.