I read this intro to Newton method optimization
https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization#Method
Now, the intuition I get from the proof there - we are just looking for $\Delta x$, such that our derivative is zero. Well, this effectively means that the function $f(x + \Delta x)$ will achieves its maximum. Now, we've found $\Delta x_1$.
Now - IMPORTANT QUESTION
WHY we need to continue? Where can it move from this point? Point $x + \Delta x_1$ will already give us maximum (local or global) for $f(x + \Delta x) = f(x + \Delta x_1)$, but we will not be able to find the further better point, because we would have found it in the first step, right?
Why do we need many iterations for Newton optimization method then?
The optimization case is harder to visualize than the root-finding case. They are really one and the same, the optimization version is just trying to find a root of $\nabla f$ using the root-finding version. They are only treated differently because the nonlinear system $g=0$ has some different properties when $g=\nabla f$ for some scalar function $f$.
The visualization exercise is easily done with a classic problem: root finding for $f(x)=x^2-2$. Consider the Newton method started at $x=1$. The method considers the tangent line $y=2(x-1)-1$, which has a zero at $x=1.5$. $f$ itself does not have a zero here, but $1.5$ is closer to the actual zero of $f$ (which is $\sqrt{2} \approx 1.414$) than $1$ was. The problem is that actually $f$ increased a bit faster than its tangent line did, so $f(1.5)>0$. The next Newton iteration takes that into account and starts sending you to the left instead.
The same thing happens in the optimization case: you make a quadratic approximation of the objective function and find the maximum of that quadratic approximation, but there is no reason to expect that the maximum of the objective function is in the same place.