Best algorithm to solve a large linear system (from discretization of high-dimensional PDE) with coefficient matrix very sparse, banded, known shape

Question

Best algorithm to solve a large linear system (from discretization of high-dimensional PDE) with coefficient matrix very sparse, banded, known shape

378 Views Asked by Bumbble Comm At 10 May 2026 - 8:52

I want to solve a large system of linear equations $A x = b$ that derives from the discretization of a PDE in a high dimensional space. For now, I have $3$ dimensions but I will eventually increase this number. Similar posts have asked this question. However, they usually don't exactly fit the specificities of my problem which I believe could be leveraged for higher computation speeds.

Consider the following state-space $S = [\underline{s}_1 , \overline{s}_1] \times [\underline{s}_2 , \overline{s}_2] \times [\underline{s}_3 , \overline{s}_3] $. Consider the following PDE:

$$ \dot{V}_t(s) = g_t(s) + \frac{\partial V_t (s)}{\partial s_1} \mu_1(s) + \frac{\partial V_t (s)}{\partial s_2}\mu_2(s) + \frac{\partial V_t (s)}{\partial s_3}\mu_3(s) - \frac{1}{2}\frac{\partial^2 V_t(s) }{\partial s_3^2} \sigma_3(s) $$

I discretize the state-space $\tilde{S} = \text{Vect}( \tilde{S}_1 \times \tilde{S}_2 \times \tilde{S}_3)$ with grids of length $N_j$ and step size ${\Delta_j}$ for each dimension. The size of my state space currently is $N = N_1 \cdot N_2 \cdot N_3 \simeq 5e^4 $ but is expected to grow to $1e^7$ with the additional states. I use a semi- explicit scheme (because I don't use the Newton method for $ g_t (s) $ ) and the upwind method for the partial derivatives that depends on the drifts $\mu_j(s)$ (not shown in the equation below for simplicity).

$$ \frac{V_{t+\Delta_t} (s)- V_t(s)}{\Delta_t} = g_{t} (s) + \frac{V_{t+\Delta_t} (s + \Delta_1 e_1)- V_{t+\Delta_t}(s)}{\Delta_1} \mu_1 (s) + \frac{V_{t+\Delta_t} (s + \Delta_2 e_2)- V_{t+\Delta_t}(s)}{\Delta_2} \mu_2 (s) + \frac{V_{t+\Delta_t} (s + \Delta_3 e_3)- V_{t+\Delta_t}(s)}{\Delta_3} \mu_3 (s) - \frac{1}{2} \frac{V_{t+\Delta_t} (s + \Delta_3 e_3) - 2V_{t+\Delta_t} (s ) + V_{t+\Delta_t}(s-\Delta_3 e_3)}{\Delta_3} \sigma_3 (s) $$

where $e_j$ is the canonical vector of zeros with a $1$ in position $j$. Stacking the equations yields a system of linear equations of dimension $N$

$$ \left ( I_N \frac{1}{\Delta_t} - A \right ) X_{t+\Delta_t} = g_t + \frac{1}{\Delta_t} X_t $$

Let $\tilde{A} = \frac{1}{\Delta_t} I_N - A $ and $\tilde{b} = g_t + \frac{1}{\Delta_t} X_t $.

Now, a couple of general remarks regarding the structure of the problem $\tilde{A} X = \tilde{b} $ :

As is common with high dimensional dynamical problems in physics or finance, the matrix $A$ is very sparse and banded.
However, $A$ is not tri-(or penta-, ...)-diagonal and its bandwidth is relatively large (approximately $N_1 \cdot N_2$ ).
Although, despite the large bandwidth, most of the diagonals are zero. The number of non-zero diagonals is of order size $1+2(d-1)$, where $d$ is the dimension of the state-space ($3$ in my case). This type of matrices is sometimes called “Outrigger” matrix (see here). It is actually tri-diagonal (the main, first super and first sub diagonals), then there is another super and lower diagonal at location $N_1$ and another super and lower diagonal at location $N_1 \cdot N_2$
$A$ is not exactly symmetric because of the upwind scheme. Technically, I could make it symmetric by using central differences, but it is not always true (for some specific states that always have a positive drift) and it tends to affect the stability.
Finally, the coefficients of the matrix $A$ will change at every iteration of my problem, but the location of the coefficients (the shape of the matrix) is known ex-ante and is constant.

Here is a graphical example of the full matrix $A$ and zooming on the top-left corner.

Spy A (zoom) Spy A (full)

Currently, solving this system using A\b in MATLAB takes around $\sim 3$ seconds on my computer, but the computation times increases exponentially as I add grid points or dimensions to my problem. I would like to ask for some advice of the most appropriate numerical method given the specific structure of my problem.

Here are some steps I have tried doing:

Using a built-in iterative method in MATLAB such as GMRES or QMR rather than the backlash operator. However, I tend to find that the approximation of the solution is not good (the norm of the difference with the exact solution is around $1e^2$) which creates a lot of instability for the convergence.
Using the approximation that $\left ( I_N - B \right )^{-1} \simeq I_N + B + B^2 + B^3 + ... + B^K $ to avoid solving the system, but again the approximation error is large.
Using bandwidth-reducing algorithms such as the symmetric reverse Cuthill-McKee algorithm, but given that my diagonals are unevenly spaced, I only get a marginal reduction in bandwidth (divided by $4$) and computing time.

Note that using the backslash method and not caring for time, I do find the correct solution. My problem truly lies with the optimization and power of numerical algorithms.

I don't come from the maths / CS / Engineering / Physics field so I am not on top of the literature when it comes to state-of-the-art numerical algorithms. What I would like to ask specifically is the following:

Given that my matrix $A$ (and de facto $\tilde{A}$) has a known shape, is it possible to create a generic permutation (or pre-conditioning) matrix $P$ at the beginning of my code (even if this has a high computing cost) that I could then use at every iteration? For instance, if that matrix only has $1$'s and $-1$'s it could be applied even if the coefficients in the matrix $A$ are changing.
Can I leverage other aspects of my problem to gain computing time? For instance, is it possible to parallelize it in a certain way? Use a GPU? I do have access to a computing cluster but I feel that even with more resources I would first need to apply some clever steps to optimize my code (also my current computer has 12 CPU cores, 38 GPU cores and 96GB of RAM).
What are the most advanced/robust algorithms used to solve high-dimensional PDEs like mine?

I have read a lot on sparse matrix algorithms and iterative algorithms recently but may definitely have missed or mis-understood some of the existing solutions given that I am new to the field. I would more than like to hear your general thoughts and advice if you have dealt with a similar problem.

I code mainly using MATLAB (and Python numpy) so any specific advice would be appreciated.

Some resources I have found useful so far:

Sparse matrix algorithms by Tim David (although it seems that most of it is already built in MATLAB?)
Iterative Methods for Sparse Linear Systems by Yousef Saad
Lecture Notes on Numerical Linear Algebra and Solution of Linear Systems from the MIT
Lecture Notes on Narrow banded linear systems from the University of Illinois
Review of GPU-Accelerated Linear Solvers

Original Q&A

Best algorithm to solve a large linear system (from discretization of high-dimensional PDE) with coefficient matrix very sparse, banded, known shape

Related Questions in LINEAR-ALGEBRA

Related Questions in PARTIAL-DIFFERENTIAL-EQUATIONS

Related Questions in NUMERICAL-LINEAR-ALGEBRA

Related Questions in SPARSE-MATRICES

Trending Questions

Popular # Hahtags

Popular Questions