Proving convergence in recurrence relation with recurrence $x_{n+1} = \gamma(x_n - x_{n-1})$ with $\gamma <1$ (momentum gradient descent)

116 Views Asked by At

Suppose we have a recurrence relation that $x_{n+1} = \alpha(x_n - x_{n-1})$ with $0 < \alpha< 1$.

Then I want to show that $f(x_n) \rightarrow 0$ as $n \rightarrow \infty$ where $f(x) = x^2$.

I think this should hold for any starting points, but if not, let's say $x_0$ is arbitrary and $x_1 = 0$.

This seems intuitive enough, and by plotting it in python it does indeed reliably converge to $0$.

This was obtained by simplifying gradient descent on $x^2$ with added momentum:

i.e, we use momentum gradient descent $x_{n+1} = x_n - \eta g_n $ for some learning rate $\eta$, and with momentum updated gradient $g_n = (1-\gamma)g_{n-1} + \gamma \nabla f(x_n)$. ($g_{-1} = 0$)

I wanted to show that if $\eta > 1$, then this momentum updated gradient descent converges as a solution (in contrast to vanilla GD which wouldn't update) IF we pick an appropriate value of $\gamma$:

So what I did was find that $x_{t+1} = x_t - \eta\big[(1-\gamma)g_{t-1} + \gamma\nabla f(x_t)\big] = x_t(2-\gamma - 2\eta\gamma) - x_{t-1}(1-\gamma)$ using the facts that $\nabla f(x) = 2x$ and $x_t - x_{t-1} = -2\eta g_{t-1}$.

Then I thought to pick $\gamma = 1/2\eta$ which leads me to $x_{t+1} = (1-\frac{1}{2\eta})(x_t - x_{t-1})$

Now I'm a bit stuck showing convergence of $f(x)$ using these updates.

(Note that with this scheme $x_1$ will always be $0$).

Should I be picking a more convenient value of $\gamma$? Or can I show convergence using this recurrence relation?

1

There are 1 best solutions below

2
On

If you have the recurrence

$x_n = \gamma(x_{n-1}-x_{n-2})$ with $0 < \gamma < 1$ you can make the ansatz

$x_n = \lambda^n$ and see that $\lambda$ must satisfy the equation

$\lambda^2 -\gamma\lambda+\gamma=0$. This has roots

$\lambda_{\pm} =\frac{\gamma\pm \sqrt{\gamma^2-4\gamma}}{2} $

Since $\gamma < 1$, $\gamma^2 < \gamma< 4\gamma$ and hence we have complex roots with positive real part less than 1. They are complex conjugate with modulus $\vert \lambda_{\pm}\vert = \sqrt{\gamma}$. So we have that $\vert\lambda_{\pm}\vert^n = \gamma^{n/2}\to 0$ as $n\to\infty$.

Then our solution is $x_n = A_+\lambda_+^n+A_-\lambda_-^n$. And for any initial condition \begin{align}\lim_{n\to\infty}\vert x_n\vert \leq \lim_{n\to\infty}\vert A_+\vert\lambda_+\vert^n+\vert A_-\vert\vert\lambda_-\vert^n = 0\ . \end{align}

By the continuity of $f(x)=x^2$ we have $f(x_n)\to 0$ as $n\to\infty$.