Descent directions at stationary points using definition of the notation little-o - o()

86 Views Asked by At

Assume that $f: \mathbb{R}^n \rightarrow \mathbb{R}^n$ is $C^2$ with $\nabla f(x) = 0$ and $\nabla^2f(x)$ indefinite (with both positive and negative eigenvalue(s)) for some point $x\in \mathbb{R}^n$. Show that $f$ admits descent directions (in the weak sense) at $x$.

answer: Let $f: \mathbb{R}^n\rightarrow \mathbb{R}, C^2$ with $\nabla f(\textbf{x}) = 0$ (x is a stationary point) and $\nabla^2f(\textbf{x})$ is indefinite (x is a saddle point)

A descent direction in the weak sense is defined as: $$ d\in \mathbb{R}^n, \exists \varepsilon >0, \forall \lambda \in (0,\varepsilon) \quad f(\textbf{x} + \lambda d) <f(\textbf{x}) $$ A second order Taylor expansion gives, for $\lambda \in \mathbb{R}$ and $\textbf{d}\in \mathbb{R}^n$ \begin{align*} f(\textbf{x} + \lambda \textbf{d}) &= f(\textbf{x}) + \underbrace{\lambda \nabla f(\textbf{x})^\top \textbf{d}}_{=0} + \lambda^2\textbf{d}^\top\nabla^2 f(\textbf{x})\textbf{d} + o(\|\lambda \textbf{d}\|^2)\\ &= f(\textbf{x}) + \lambda^2 \textbf{d}^\top\nabla^2f(\textbf{x})d+o(\lambda^2\|\textbf{d}\|^2) \end{align*} Let $\textbf{d}$ be a normed eigenvector associated to the eigenvalue $\alpha < 0$ of $\nabla^2f(\textbf{x})$. Then: $$ f(\textbf{x}+\lambda \textbf{d}) = f(\textbf{x}) + \lambda^2\alpha \underbrace{\|\textbf{d}\|^2}_{=1}+ o(\lambda^2) $$ As $o(\lambda^2)$ is negligible compared to $\lambda^2\alpha$ when $\lambda \rightarrow 0$, we know that $$ \exists \eta >0, \forall \lambda \leq \eta \qquad \|o(\lambda^2)\|< \|\alpha \lambda^2\| $$ So for $\lambda \leq \eta$ $$\underbrace{\lambda^2\alpha}_{<0} + \underbrace{o(\lambda^2)}_{?} <0 $$ Therefore $$f(\textbf{x} + \lambda \textbf{d}) = f(\textbf{x}) + \lambda^2\alpha + o(\lambda^2)<f(\textbf{x}) \quad \text{for } \lambda \in (0,\eta) $$ $f$ admits a descent direction in $\textbf{x}$

Here is my question: I have trouble understanding the following sentence: As $o(\lambda^2)$ is negligible compared to $\lambda^2\alpha$ when $\lambda \rightarrow 0$. It has probably to do with my lack of understanding of little o

Here is the definition I found:

Intuitively, the assertion "$f(x)$ is $o(g(x))$" (read "$f(x)$ is little-o of $(g(x))$") means that $g(x)$ grows much faster than $f(x)$.

Or

As $g(x)$ is nonzero, or at least becomes nonzero beyond a certain point, the relation $f(x) = o(g(x))$ is equivalent to $$ \lim_{x \to \inf}\frac{f(x)}{g(x)} = 0 $$

Those two definitions define the relationship of $f(x)$ and $g(x)$ when $x$ tends to infinity. In our case $x$ (or $\lambda$) tends to zero. How can we make a paralell?