Convergence to a local minimum of a nonconvex function

687 Views Asked by At

Let $f: \mathbb{R}^n \to \mathbb{R}$ be a differentiable nonconvex function that is bounded below. Let $x^* \in \mathbb{R}^n$ be a local minimum of $f$. Suppose we run gradient descent algorithm $$x^{k+1}=x^k -\alpha \nabla f(x^k)$$ from $x^0 \in \mathbb{R}^n$ sufficiently close to $x^*$, i.e., $\|x^0-x^*\|< \epsilon$. How can we show that the generated sequence, $(x^k)_{k\geq 0}$ converges to $x^*$?

(We are assuming $\alpha$ is small so that iterates are close to each other.)


Here is what I tried. Using Taylor expansion one can write the following:

$$ f(x^{k+1}) = f(x^k) + \left\langle \nabla f(x^k), x^{k+1}-x^k \right\rangle + \mathcal{o}\left(\|x^{k+1}-x^{k}\|^2\right) $$

Then,

$$ f(x^{k+1}) = f(x^k) - \alpha \| x^{k+1}-x^k \|^2 + \mathcal{o}\left(\|x^{k+1}-x^{k}\|^2\right) $$

What I need is to show $\|x^k-x^*\| \to 0$.

1

There are 1 best solutions below

9
On BEST ANSWER

The result does hold for such a general case, even if the function has only one global minimizer. For example, there is no way to guarantee that the function does not oscillate too much around the minimum like the function $g$ such that $g(0)=0$ and $$g(x)=(\sin(1/x)+1.5)x^{2},$$ for all $x\not=0$, does. For example, the sequence generated by the function $g$ that starts at any of its infinite critical points around $0$ would stay still at this critical point forever, i.e., $x^{k}=x^{k+1},$ for all $k\in\mathbb{N}$, whenever $x^{0}$ is a critical point. To surpass this difficulty, I will need the Łojasiewicz inequality, which holds for analytic functions.

Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ be a function such that $x^{\infty}$ satisfies

$$f(x)\leq f(x^{\infty}) \implies x = x^{\infty}\tag{1}$$

for a neighborhood of $x^{\infty}$, say $B[x^{\infty}, \delta_0].$ This is stronger than asking for $x^{\infty}$ being a local minimizer. This means that $x^{\infty}$ is a strict local minimizer. That's the case for the point zero in the function $g$ above.

For now on, it's needed some control over the gradient as well. Precisely, suppose that $f$ satisfies the Łojasiewicz inequality $$M\left(f(x) - f(x^{\infty}) \right)^{\theta} \leq \|\nabla f (x)\|,\tag{2}$$ for all $x\in B(x^{\infty}, \delta_{1}),$ with $\dfrac{1}{2}\leq \theta < 1,$ and that $$f(y)-f(x) \leq \nabla f(x)^{T}(y - x) +\dfrac{L}{2}\|y - x\|^{2},\tag{3}$$ for all $x,y \in B(x^{\infty}, \delta_2),$ i.e., the gradient is Lipschitz continuous around $x^{\infty}.$

Let us start, let $\alpha<\dfrac{2}{L}$, we can take $\delta_3>0$ so that for every $x \in B[x^{\infty},\delta_{3}]$ it implies $$\alpha \|\nabla f (x) \|\leq \dfrac{\min\{\delta_0,\delta_1,\delta_2 \}}{2}.\tag{4}$$

Let $\delta_4= \dfrac{\min\{\delta_0,\delta_1,\delta_2, \delta_3 \}}{2}$, since $f$ is continuous, the function $f$ attains some minimal $m\in\mathbb{R}$ in the circular crown $$\dfrac{\delta_4}{2} \leq \|x-x^{\infty}\| \leq \min\{\delta_0,\delta_1,\delta_2 \}.$$

Now, for any $x \in B\left(x^{\infty},\dfrac{\delta_4}{2}\right)$, satisfying $f(x) < m$, given $x_{+}=x - \alpha \nabla f(x), $ we have that $$\|x_{+}-x^{\infty}\| \overset{(4)}{<} \min\{\delta_0,\delta_1,\delta_2 \}. $$ This implies that

$$\begin{align} f(x_{+}) \leq & f(x) - \alpha \|\nabla f(x)\|^{2} + \dfrac{L \alpha^{2}}{2} \|\nabla f(x)\|^{2}\\ \leq & f(x) - \left(1 - \dfrac{\alpha L}{2} \right) \alpha \|\nabla f(x)\|^{2}\tag{5}. \end{align}$$

Hence,

$$f(x_{+}) \leq f(x) < m.$$

This means that, being the set

$$U = \left\lbrace x \in B\left(x^{\infty}, \dfrac{\delta_4}{2}\right) : f(x)<m \right\rbrace $$

open and bounded with $x^{\infty} \in U$, we can say that whenever we start with $x_0 \in U$ we will stay forever in that set, i.e., if $x_0\in U$, then $x^{k} \in U$ for all $k\in\mathbb{N}.$

Now, take $x^0$ any point in $U$ (since $U$ is open and contains $x^{\infty}$ it's possible to actually take an open ball around $x^{\infty}$). Being $ x^{k} \in B \left[ x^{\infty}, \dfrac{\delta_4}{2} \right]$ for all $k$, we have that $\{x^{k}\}_k$ is bounded, and to prove convergence of the sequence we need only to prove that all subsequences of $\{x^{k}\}_k$ converges to $x^{\infty}$.

For the first part, we will prove that $x^{k}\rightarrow x^{\infty}$ for some infinite subset of $\mathbb{N}$. Indeed, let us suppose by contradiction that there is no infinite subset $K$ of $\mathbb{N}$ so that $\|x^{k}-x^{\infty}\| \overset{K}{\rightarrow} 0$. This means that, for some $\delta_5>0$, $\|x^{k} - x^{\infty}\|>\delta_5>0$ for all $k \in \mathbb{N}$. By the continuity of $f$ and condition $(1)$, this means that $$f(x^k) - f(x^{\infty})>\delta_6>0\tag{6}$$ for all $k\in\mathbb{N}.$ Hence,

$$\begin{align} f(x^{k+1}) - f(x^{\infty}) \overset{(5)}{\leq} & f(x^k) - f(x^{\infty}) - \left(1 - \dfrac{\alpha L}{2} \right) \alpha \|\nabla f(x^k)\|^{2} \\ \overset{(2)}{\leq} & (f(x^{k}) - f(x^{\infty})) - M \left(1 - \dfrac{\alpha L}{2} \right) \alpha (f(x^{k}) - f(x^{\infty}))^{2 \theta}\\ = & \left(1 - M \left(1 - \dfrac{\alpha L}{2} \right) \alpha (f(x^{k}) - f(x^{\infty}))^{2 \theta - 1} \right) (f(x^{k}) - f(x^{\infty})). \end{align}$$ Thus,

$$\begin{align} f(x^{k+1}) - f(x^{\infty}) \overset{(6)}{\leq} \left(1 - M \left(1 - \dfrac{\alpha L}{2} \right) \alpha \delta_{6}^{2 \theta - 1} \right) (f(x^{k}) - f(x^{\infty})), \end{align}$$

considering the decreasing nature of $\{f(x^{k})\}_{k}$, this implies that $f(x^{k})\rightarrow f(x^{\infty})$. Hence, taking a subsequence if necessary, there will be least a subsequence of $\{ x^{k} \}_{k}$ so that $x^{k} \rightarrow x^{\infty},$ which is a contradiction. Hence, there exist some infinite subset $K$ such that $x^{k}\overset{K}{\rightarrow} x^{\infty}.$

For the last part, given an $\epsilon>0$, we will need to prove that the subset of indexes

$$\mathcal{I} = \{ k\in\mathbb{N}\ :\ \|x^{k} - x^{\infty}\| \geq \epsilon \}$$

can't be infinite. Again, by contradiction, let us suppose that this set is infinite. Since there exist some infinite subset $K$ such that $x^{k}\overset{K}{\rightarrow} x^{\infty}$, for each $k \in \mathcal{I}$, we can choose the first $n_k\in \mathbb{N}$ such that $n_k>k$ and

$$\|x^{n_k} - x^{\infty}\| \leq \dfrac{\epsilon}{2}.$$ Note that we can take a $\delta_7>0$ so that $\|\nabla f(x)\| \geq \delta_7,$ for all $x \in\mathbb{R}^{n}$ with $\dfrac{\delta_4}{2} \geq \| x-x^{\infty}\|> \dfrac{\epsilon}{2},$ since, locally, only $x^{\infty}$ is a zero for $g(x)=\|\nabla f(x)\|$. Now, for all the intermediate $x^{i}$ with $k \leq i\leq n_k$ we have $\dfrac{\delta_4}{2} \geq \| x^{i}-x^{\infty}\| > \dfrac{\epsilon}{2}.$ Hence, $$\|\nabla f(x^{i})\| \geq \delta_7,\tag{7}$$ for all $n_k > i \geq k.$

Now, for all $k \in \mathcal{I},$

$$\begin{align} \dfrac{\epsilon}{2} \leq & \|x^{k} - x^{\infty}\| - \|x^{n_k} - x^{\infty}\| \\ \leq & \|x^{k} - x^{n_{k}}\| \\ \leq & \sum^{n_k-1}_{i = {k}} \|x^{i+1} - x^{i}\| \\ \leq & \sum^{n_k-1}_{i = {k}} \alpha \|\nabla f(x^{i})\| \\ \overset{(7)}{\leq} & \dfrac{1}{\delta_7}\sum^{n_k-1}_{i = {k}} \alpha \|\nabla f(x^{i})\|^{2} \\ \overset{(5)}{\leq} & - \dfrac{1}{\delta_7 \left(1 - \dfrac{\alpha L}{2} \right)} \sum^{n_k-1}_{i = {k}} f(x^{i+1}) - f(x^{i}) \\ \leq & \dfrac{f(x^{k}) - f(x^{n_k})}{\delta_7 \left(1 - \dfrac{\alpha L}{2} \right)} \end{align}$$

Hence,

$$\delta_7 \dfrac{\epsilon}{2} \left(1 - \dfrac{\alpha L}{2} \right) \leq f(x^{{k}}) - f(x^{n_k}),$$

for all $k\in\mathcal{I}.$ Since $\{f(x^{k})\}_{k}$ is decreasing sequence and has convergent subsequence, $\{f(x^{k})\}_{k}$ actually converges, but it contradicts the above equation.


The function $g$ is differentiable

First, $g(0) = 0$, by definition, and, by definition as well $$g(x)=(\sin(1/x)+1.5)x^{2},$$ for all $x\not=0$. Let us prove that $g$ is differentiable at $0$ with derivative $0$. We have $$\begin{align} 0 \leq & \dfrac{|g(h)-g(0)|}{|h|} \\ = & \dfrac{\left|\left(\sin(\dfrac{1}{h}) - 1.5\right)h^{2}\right|}{|h|} \\ \leq & \left|\left(\sin(\dfrac{1}{h}) - 1.5\right)\right||h|\\ \leq & 2.5 |h|,\\ \end{align}$$ hence by squeeze theorem $$g'(0)= \lim \dfrac{g(h)-g(0)}{h} = 0.$$ For points with $x\not=0$, we have that $$g(y)=(\sin(1/y)+1.5)y^{2},$$ for all points $y$ in a neighborhood of $x$ not containing zero. Since, $g$ is locally a quotient of differentiable functions in such case, $g$ is differentiable as well.