Nesterov's momentum derivation

101 Views Asked by At

On page 75 in Sutskever's thesis http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf

In equation (7.5) setting $a_0=1$,

$a_{t+1} = (1+\sqrt{4 a_t^2 + 1})/2 $

The author said, "to understand the sequence $a_t$ we note that the function $x \rightarrow (1+ \sqrt{4x^2+1})/2$ quickly approaches $x \rightarrow x + 0.5 $ from below as $x \rightarrow \infty$,"

then he claims

"so $a_t \approx (t+4)/2 $ for large $t$ "

I couldn't figure out how he made the deduction here. It is not at all obvious to me how one might even guess the express. If we brute-force it we could probably guess $t/2$ but the $t+4$ term seems to come from nowhere at the large number limit.

1

There are 1 best solutions below

2
On

(see image below)

Let $f(x):=(1+ \sqrt{4x^2+1})/2$. It is the cartesian equation of a hyperbola (H) ; the interesting half branch being represented in black in Fig. 1 ; its asymptote (A), represented in red, has equation $y=g(x):=x+1/2$ (for $x>0$).

The other line, let us call it (L), has equation $y=x$. Its usage is classical in geometric representation of iterative methods $u_{n+1}=\varphi(u_n)$. take a look for instance to (http://wwwf.imperial.ac.uk/metric/metric_public/numerical_methods/iteration/fixed_point_iteration.html) which gives different instances of staircase patterns generated by sequences of points $(u_k, u_k), (u_k,\varphi(u_k)$.

Comparison between the two iterations $$\begin{cases}a_{n+1}=f(a_n)\\ t_{n+1}=g(t_n)\end{cases}$$ can be transferred to a comparison of the staircase pattern (in blue) between (H) and (L) and the stair case pattern (in black) between (A) and (L) ; for the

The second staircase pattern (blue points with coordinates $(1,3/2), (3/2,3/2), (3/2,2),...$ is asymptotic to the first one.

In the case of the second iteration, it is clear that the increasing rate is constant, i.e., it's an arithmetic progression, thus given by the formula you desired to understand.

enter image description here