Consider a system of $n-1$ (independent) equations in $n$ unknowns. Generically, the solutions are families of curves. When an point of one curve is known, it is possible to do numerical continuation to get all the curve (numerically of course).
The advantage of numerical continuation techniques is not clear for me, given that the system can also be written as a system of ODEs for which many efficient techniques exist.
For example, consider the trivial case: $$ f(X)=x_1+\sin(x_2^2)$$ with $X^\top = [x_1\ x_2]$. The solutions of $f(X)=0$ are the solutions of
$$\nabla_X f(X(t))\cdot X'(t) = 0 \qquad {\textit{i.e.}}\qquad x_1'(t)+2x_2(t)x_2'(t)\cos(x_2(t)^2)=0$$ where $t$ is the curvilinear coordinate of the considered curve. This equation can be complemented by $x_1'(t)^2+x_2'(t)^2=1$, yielding two ODEs in two variables $x_1$ and $x_2$. As for continuation, solving it numerically requires initial conditions, for instance $x_1(0)=0$ and $x_2(0)=0$.
So my question is, what is the point of pseudo arc-length continuation and other numerical techniques compared to writing the problem as ODEs?
Predictor-corrector methods like pseudo-arclength continuation guarantee that $f(X)$ stays small at each point along the curve, through the corrector step. They can't necessarily ensure that $X$ is very close to the curve itself, if the curve does something weird, but bifurcation theory (and numerical bifurcation detection) can be used to avoid such issues. This property gives robustness to this class of methods.
By contrast, standard numerical methods for ODE (e.g. the Dormand-Price method, built into Matlab as ode45) tend to have gradually accumulating errors, which causes them to lack robustness in this sense.
By contrast to that, there are numerical methods for ODE which have conserved quantities. Such methods have a numerically conserved quantity which is either the same as the true conserved quantity or some numerical perturbation of it. Some of these (used for Hamiltonian systems) are called symplectic methods. In general such methods have robustness in the same way as the predictor-corrector methods.
A good test problem for this sort of thing is just the familiar $x^2+y^2=1$. Standard ODE methods applied to your system with this $f$ will tend to drift away from the circle.