I am using Newton's method to solve an equation of the form $y=f(x)$ for increasing values of $y$. The value of $x$ found for some $y$ is used for the next $y$ as a starting approximation and this works fine.
$f$ is increasing but may have stationary points, and I do have situations where the range of interest of $y$ starts or ends on an extremum or very close. This causes the algorithm to experience a division by zero or yield very inaccurate roots.
Is there a simple way to cope ? Is there a known modified Newton's method able to deal with this situation ?
I know that double roots can be handled by the modified iterations $$x\leftarrow x-2\frac{f(x)}{f'(x)}$$ but this still does not work at the root, and one needs to know when to activate the factor $2$.
For information, my function $f$ is a cubic polynomial, but I want to avoid the explicit computation of the roots by Cardano's formulas.
Update:
If we restate Newton's method at a stationary point, we get to the next term of Taylor's development,
$$f(x+h)-y\approx f(x)-y+\frac{h^2}2f''(x),$$
so that $$h\approx\sqrt{-\frac{f(x)-y}{f''(x)}}.$$
In fact, the approximation
$$f(x+h)-y\approx f(x)-y+h f'(x)+\frac{h^2}2f''(x)$$ always holds, giving the general formula
$$h\approx \frac{f'(x)\pm\sqrt{f'^2(x)-2f''(x)(f(x)-y)}}{f''(x)}.$$
Due to the increased complexity, I don't want to use that formula all the time.

Momentum-based algorithms are good at passing through stationary points while retaining fast convergence.
$$m_i=(1-\beta)f'(x_i)+\beta m_{i-1}=m_{i-1}-(1-\beta)(m_{i-1}-f'(x_i))$$ $$x_{i+1}=x_i-(1-\beta^i)\frac{f(x_i)}{m_i}$$
The $1-\beta^i$ factor serves as bias correction, supposing one takes $m_0=0$, since we get $m_i=(1-\beta^i)f'(x_i)$ if $x_i$ is fixed.
As $\beta\to0$, you get Newton's method and faster convergence, while for larger $\beta$ you will get better "resilience" against stationary points.
For your purposes, you know that $f'(x)\ge0$, so rather than setting $m_0=0$, you may want to initialize it to a small positive value, or perhaps even use the same $m_i$ sequence for every $y$, since it is likely you will converge to the solution for $y=f(x)$ for a given $y$ while $m_i$ is sufficiently large that starting from where it left off works well for the next $y$.
If you go with the latter approach, you'll want to choose $\beta$ based on the density of your $y$ around the stationary points. If $\beta$ is too small and you have many $y$ near a stationary point, then $m_i$ will decay rapidly, which leaves you with the original issue.