I'm trying to understand how the Newton's method in optimization works.
This is the algorithm:
$S_0)$ Choose $x_{0}\in \mathbb{R}^{n},\rho>0,\ p>2,\ \beta\in(0,1), \displaystyle \sigma\in\left(0,\frac{1}{2}\right),\ \varepsilon\geq 0$, set $k =0$
$S_1)$ If $\Vert\nabla f(x_{k})\Vert\leq\varepsilon \ \ \ $ STOP
$S_2)$ determine $d_{k}$ with $D^{2}f(x_{k})d_{k}=-\nabla f(x_{k})$ if $d_{k}$ cannot be calculated or the condition
$$\nabla f(x_{k})^{T}d_{k}\leq-p\Vert d_{k}\Vert^{p}$$
is violated, set $d_{k}=-\nabla f(x_{k})$
$S_3)$ determine $t_{k}\in\{\beta^{j}:j=0,1,2,\ \ldots\}$ maximal with $$ f(x_{k}+t_{k}d_{k})<f(x_{k})+t_{k}\sigma\nabla f(x_{k})^{T}d_{k} $$
$S_4)$ Set $x_{k+1}=x_{k}+t_{k}d_{k},\ k:=k +1$, go to $S_1)$.
My question:
As I understand, the algorithm presented in my first post generates a sequence xk which converges to a strict local minimum of the twice differentiable function $f$. Why is this algorithm so interesting, if it can't even find the global minimum of the given function?
Thank you very much for your time!
It's not easy to find a global minimum of a general function. Many functions have lots of local minima and in principle the algorithm would need to check all of them to find the local a global minimum. This is for example a central problem in Neural Networks learning algorithms. I'm not expert on this, but I guess the algorithm is interesting because it is simple, it works for a relatively general class of functions, and it is fast.