Convergence of a "simple" stochastic algorithm

120 Views Asked by At

In this question, we will consider the convergence of an algorithm as follows.

Let $f(x) = \frac{1}{2}x^2$ be our objective function. We define a so-called forcing function $\rho(\alpha) = \frac{1}{2}\alpha^2$. Then given an initial point $x_0$ and initial step size $\alpha_0$, we define the following iteration:

  1. generate a random number $d_k$ with probability $\mathbb{P}(d_k = 1) = \mathbb{P}(d_k = -1) = \frac{1}{2}$;
  2. if $f(x_k+\alpha_k d_k)<f(x_k)-\rho (\alpha_k)$ (i.e. $\alpha_{k}<-d_kx_k$), then $x_{k+1} = x_k+\alpha_k d_k$ and $\alpha_{k+1} = 2\alpha_k$; if not, $x_{k+1} = x_{k}$ and $\alpha_{k+1} = \frac{1}{2}\alpha_k$.

I am curious that if this algorithm has the following convergence property: $$ \mathbb{P}\left( \lim_{k\to\infty}|x_k| = 0 \right) = 1. $$

There is an important observation. Here the step size $\alpha_k$ will converge to 0 with probability 1. This is due to the forcing function $\rho(\alpha)$, if not, we can show that $\sum_{k=1}^\infty \rho(\alpha_k)=\infty$ which contradicts that $f(x)$ has a lower bound.

I find this well-defined "simple" question actually not easy to analyze. I believe it is convergent. I tried to show this by Borel-Cantelli Lemma but failed.

I think the step size $a_k$ can be regarded as a stochastic process: $$a_k = a_0 2^{\sum_{i=0}^{k-1} Y_i},\quad k\ge 1$$ where $Y_i = 2*\mathbf{1}_{\{\alpha_{k}<-d_kx_k\}}-1$. Then we may consider $S_k = \sum_{i=0}^{k-1} Y_i$, however, it is far from a random walk. Any idea will help. Thank you.

1

There are 1 best solutions below

8
On BEST ANSWER

The answer is yes if I am not wrong. The idea is that $\alpha_k$ can't be always very small and will, eventually, have the same order as $x_k$, thus $x_{k + 1} = x_k + d_k\alpha_k$ will be significantely closer to $0$ than $x_k$ (at least twice closer).

First of all, I assume you step is positive. I also assume $x_0 \neq 0$, else the problem is trivial. In this case, for each $k$, we have $\alpha_k < -d_kx_k$, then $d_kx_k < 0$ so the sign of $d_k$ is opposite to the sign of $x_k$, which implies that $x_{k + 1} = x_k + d_k\alpha_k = d_k(\alpha_k + d_kx_k)$ also has an opposite sign to $d_k$ (because $\alpha_k + d_kx_k < 0$). We deduce that $(x_k)$ has a constant sign. Without loss of generality, assume that $x_0 > 0$, thus for all $k$, $x_k > 0$ and $(x_k)$ clearly decreases.

Let us write for all $k$, $\alpha_k = \alpha_02^{m_k}$ where $m_k \in \mathbb{Z}$ and $m_0 = 0$. Notice that at each step $k$, $\alpha_k < -d_kx_k$ is equivalent to $d_k = -1$ and $\alpha_k < x_k$ i.e. $m_k < \log_2\left(\frac{x_k}{\alpha_0}\right)$, which is equivalent to $m_k \leqslant \left\lceil\log_2\left(\frac{x_k}{\alpha_0}\right)\right\rceil - 1$ since $m_k$ is an integer.

Now, consider the decreasing sequence of integers $a_k = \left\lceil\log_2\left(\frac{x_k}{\alpha_0}\right)\right\rceil - 1$. Let for all $k \in \mathbb{N}$, $I_k = \{q \geqslant k|a_q = a_k\}$. Each $I_k$ is clearly an integers interval. While $q \in I_k$, we have, $$ m_{k + 1} = \left\{ \begin{array}{l} m_k + 1 \textrm{ if } d_k = -1 \textrm{ and } m_k \leqslant a_k,\\ m_k - 1 \textrm{ else}. \end{array} \right. $$ We recognise a random walk on $\mathbb{Z}$ with a stop line at $a_k$ (which is fixed while we remain in $I_k$). Notice that it is possible that $m_{k + 1} > a_k$ but in this case, the $m_q$ will decrease until $m_q = a_k$.

This walk is reccurent in $\mathbb{Z} \cap ]-\infty,a_k]$ so we will have $m_q = a_k$ infinitely many times unless $I_k$ is bounded. And infinitely many times, when $m_q = a_k$, we will have $d_q = -1$. Let $q \geqslant k$ be an integer such that $m_q = a_k$ and $d_q = -1$.

In this case, \begin{align*} x_{q + 1} & = x_q - \alpha_q\\ & = x_q - \alpha_02^{a_k}\\ & = \alpha_0\left(\frac{x_q}{\alpha_0} - 2^{a_k}\right). \end{align*} Therefore, \begin{align*} a_{q + 1} & = \left\lceil\log_2\left(\frac{x_{q + 1}}{\alpha_0}\right)\right\rceil - 1\\ & = \left\lceil\log_2\left(\frac{x_q}{\alpha_0} - 2^{a_k}\right)\right\rceil - 1\\ & \leqslant \left\lceil\log_2\left(\frac{x_k}{\alpha_0} - 2^{a_k}\right)\right\rceil - 1. \end{align*} And we have $a_k = \left\lceil\log_2\left(\frac{x_k}{\alpha_0}\right)\right\rceil - 1 \geqslant \log_2\left(\frac{x_k}{\alpha_0}\right) - 1$ so $2^{a_k + 1} \geqslant \frac{x_k}{\alpha_0}$, which implies $\frac{x_k}{\alpha_0} - 2^{a_k} \leqslant 2^{a_k}$. We deduce that $\log_2\left(\frac{x_k}{\alpha_0} - 2^{a_k}\right) \leqslant a_k$ thus $a_{q + 1} \leqslant \left\lceil\log_2\left(\frac{x_k}{\alpha_0} - 2^{a_k}\right)\right\rceil - 1 \leqslant a_k - 1$.

By definition, it means that $q + 1 \notin I_k$. Such a $q$ exists with probability $1$ for all $k$ hence each $I_k$ is bounded with probability $1$. It means that $a_k \rightarrow -\infty$, so $x_k \rightarrow 0$ with probability $1$.