Solving many independent non-linear systems simultaneously

211 Views Asked by At

I'm working on solving lots of systems of nonlinear equations. Luckily, the non-linear equation is the same, but the parameters are different:

$$ f(\vec{x}_0; c_0) = 0\\ f(\vec{x}_1; c_1) = 0\\ ...\\ f(\vec{x}_N; c_N) = 0 $$

where the $\vec{x}_i$'s are the vectors to be solved for and the $c_i$'s are the various parameters that change from system to system. The dimension of $\vec{x}_i$ is somewhere between 20-50, and $N$ is a lot (a few thousand). I'd like to solve them simultaneously, and the motivation is as follows. The function evaluation is pretty computationally intensive and has been significantly sped up by using GPU computing (which leaves the answer on the GPU). The non-linear solver, by comparison, is fairly cheap since it's only matrix inverses of small-ish matrices and CUDA performance on small matrices is fairly uninspiring. It would be nice to solve for multiple systems simultaneously (maybe 5-10 depending on the size) at which point CUDA acceleration becomes worth it. The trade space between dividing up cores for evaluating the function for multiple sets of parameters simultaneously and the possible gains in reduced overhead for larger non-linear system solving is not immediately obvious, but it's worth exploring to me. Unless simultaneous solving doesn't converge in which case it's a non-starter.

Unfortunately, my naive attempt at constructing the system $f([\vec{x}_0 ; ... ; \vec{x}_N]; [c_0 ; ... ; c_N]) = 0$ (i.e. stacking the vectors/coefficients) and calculating the block diagonal Jacobian $$ \left[ \begin{array}{ccc} J_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & J_N \end{array} \right] $$ where each block is the Jacobian of an individual equation didn't work; neither MATLAB's non-linear solvers -- (Powell's?) dogleg method, trust region reflective (I'm not too familiar with what goes in to that one yet), and Levenberg-Marquardt -- nor a homegrown backtracking Newton method could reach our rather tight tolerance for just $N=2$ (the trust regions/steps just got too small). The initial guess I gave was very good for solving the individual systems: the $c$'s don't change that much from $i$ to $i+1$, so yesterday's answer is pretty close to today's, and it's usually less than 4 steps to solve the for update.

So my question is: Are there any "general" non-linear solver methods that gracefully handle block-diagonal Jacobian matrices? (I could give up calculating the analytical Jacobian if it helps.)