convergence of gradient descent algorithm

142 Views Asked by At

How to show that for $r>0$ small enough, the sequences $(x_n)_n$ and $(y_n)_n$ defined by : $$ x_{n+1} = x_n -4r(x_n^3+y_n)\\ y_{n+1} = y_n -4r(y_n^3+x_n) $$ are converging for any inital condition $(x_0,y_0)\in \mathbb{R}^2$ ?

What I tried :

  • it seems these recurrence relations can be obtained from a gradient descent with constant rate $r$ applied to the function $f(x,y)=x^4+y^4+4xy$. This function has two minima : $(-1,1)$ and $(1,-1)$ so one should expect these sequences to converge to these points depending on the initial condition (whether it's on one side of y=x or the other side). However to have convergence of gradient descent algorithm one requires $f$ to be an $\alpha$-convex function (this is not the case here since the eigenvalues of $\nabla^2 f$ may not be positive) and $\nabla f$ to be Lipschitz.
2

There are 2 best solutions below

0
On BEST ANSWER

Let $p_n=(x_n,y_n)$ and $\|p_n\|=\max(|x_n|,|y_n|)$, it is not hard to show $\|p_{n+1}\|<\|p_n\|$ when $\|p_n\|\geq2$ and $r$ is small enough. Hence whenever $p_n$ goes beyond the boundary $\|p_n\|=2$ it will go back towards inside. Moreover, when $\|p_n\|\leq2$ the gradient $\nabla f(p_n)$ is bounded and therefore $\|p_{n+1}\|\leq3$ for small enough $r$. These two facts together imply that $p_n$ will eventually be trapped in $\|p_n\|\leq3$, which means $\nabla^2f$ is bounded during the entire iteration (where the bound only depends on the initial condition).

Let this bound be $L$, i.e. $\nabla^2f(p_n)\preceq LI$ for all $n$, then by Taylor expansion and the relation $p_{n+1}=p_n-r\nabla f(p_n)$ we have \begin{align*} f(p_{n+1})&=f(p_n)-r\|\nabla f(p_n)\|_{L_2}^2+\frac12r^2\big\langle\nabla f(p_n),\nabla^2f(\xi)\nabla f(p_n)\big\rangle\\[.5em] &\leq f(p_n)+\frac r2(rL-1)\|\nabla f(p_n)\|_{L_2}^2, \end{align*} where $\xi$ is on the line segment between $p_n$ and $p_{n+1}$ (thus $\nabla^2f(\xi)\preceq LI$). Choosing $r$ small enough, we can see $f(p_n)$ is strictly decreasing until $\nabla f(p_n)=0$, which is a guaranteed convergence since $f$ is bounded below.


In fact, this can be generalised to other examples, which is already discussed in hauau's answer.

0
On

actually, gradient descent converges to a stationary point $x^*: \nabla f(x^*) = 0$ even if the function is not convex (but in this case $x^*$ might not be even a local minimum - it might be a saddle point for example), see e.g. Nesterov, Introductory Lectures on Convex Programming, 1.2.3. The gradient in your case is continuous so it is Lipschitz on any compact subset of $\mathbb R^2$, and gradient descent converges under these conditons