I want to solve a large system of linear equations $A x = b$ that derives from the discretization of a PDE in a high dimensional space. For now, I have $3$ dimensions but I will eventually increase this number. Similar posts have asked this question. However, they usually don't exactly fit the specificities of my problem which I believe could be leveraged for higher computation speeds.
Consider the following state-space $S = [\underline{s}_1 , \overline{s}_1] \times [\underline{s}_2 , \overline{s}_2] \times [\underline{s}_3 , \overline{s}_3] $. Consider the following PDE:
$$ \dot{V}_t(s) = g_t(s) + \frac{\partial V_t (s)}{\partial s_1} \mu_1(s) + \frac{\partial V_t (s)}{\partial s_2}\mu_2(s) + \frac{\partial V_t (s)}{\partial s_3}\mu_3(s) - \frac{1}{2}\frac{\partial^2 V_t(s) }{\partial s_3^2} \sigma_3(s) $$
I discretize the state-space $\tilde{S} = \text{Vect}( \tilde{S}_1 \times \tilde{S}_2 \times \tilde{S}_3)$ with grids of length $N_j$ and step size ${\Delta_j}$ for each dimension. The size of my state space currently is $N = N_1 \cdot N_2 \cdot N_3 \simeq 5e^4 $ but is expected to grow to $1e^7$ with the additional states. I use a semi- explicit scheme (because I don't use the Newton method for $ g_t (s) $ ) and the upwind method for the partial derivatives that depends on the drifts $\mu_j(s)$ (not shown in the equation below for simplicity).
$$ \frac{V_{t+\Delta_t} (s)- V_t(s)}{\Delta_t} = g_{t} (s) + \frac{V_{t+\Delta_t} (s + \Delta_1 e_1)- V_{t+\Delta_t}(s)}{\Delta_1} \mu_1 (s) + \frac{V_{t+\Delta_t} (s + \Delta_2 e_2)- V_{t+\Delta_t}(s)}{\Delta_2} \mu_2 (s) + \frac{V_{t+\Delta_t} (s + \Delta_3 e_3)- V_{t+\Delta_t}(s)}{\Delta_3} \mu_3 (s) - \frac{1}{2} \frac{V_{t+\Delta_t} (s + \Delta_3 e_3) - 2V_{t+\Delta_t} (s ) + V_{t+\Delta_t}(s-\Delta_3 e_3)}{\Delta_3} \sigma_3 (s) $$
where $e_j$ is the canonical vector of zeros with a $1$ in position $j$. Stacking the equations yields a system of linear equations of dimension $N$
$$ \left ( I_N \frac{1}{\Delta_t} - A \right ) X_{t+\Delta_t} = g_t + \frac{1}{\Delta_t} X_t $$
Let $\tilde{A} = \frac{1}{\Delta_t} I_N - A $ and $\tilde{b} = g_t + \frac{1}{\Delta_t} X_t $.
Now, a couple of general remarks regarding the structure of the problem $\tilde{A} X = \tilde{b} $ :
- As is common with high dimensional dynamical problems in physics or finance, the matrix $A$ is very sparse and banded.
- However, $A$ is not tri-(or penta-, ...)-diagonal and its bandwidth is relatively large (approximately $N_1 \cdot N_2$ ).
- Although, despite the large bandwidth, most of the diagonals are zero. The number of non-zero diagonals is of order size $1+2(d-1)$, where $d$ is the dimension of the state-space ($3$ in my case). This type of matrices is sometimes called “Outrigger” matrix (see here). It is actually tri-diagonal (the main, first super and first sub diagonals), then there is another super and lower diagonal at location $N_1$ and another super and lower diagonal at location $N_1 \cdot N_2$
- $A$ is not exactly symmetric because of the upwind scheme. Technically, I could make it symmetric by using central differences, but it is not always true (for some specific states that always have a positive drift) and it tends to affect the stability.
- Finally, the coefficients of the matrix $A$ will change at every iteration of my problem, but the location of the coefficients (the shape of the matrix) is known ex-ante and is constant.
Here is a graphical example of the full matrix $A$ and zooming on the top-left corner.
Currently, solving this system using A\b in MATLAB takes around $\sim 3$ seconds on my computer, but the computation times increases exponentially as I add grid points or dimensions to my problem. I would like to ask for some advice of the most appropriate numerical method given the specific structure of my problem.
Here are some steps I have tried doing:
- Using a built-in iterative method in MATLAB such as GMRES or QMR rather than the backlash operator. However, I tend to find that the approximation of the solution is not good (the norm of the difference with the exact solution is around $1e^2$) which creates a lot of instability for the convergence.
- Using the approximation that $\left ( I_N - B \right )^{-1} \simeq I_N + B + B^2 + B^3 + ... + B^K $ to avoid solving the system, but again the approximation error is large.
- Using bandwidth-reducing algorithms such as the symmetric reverse Cuthill-McKee algorithm, but given that my diagonals are unevenly spaced, I only get a marginal reduction in bandwidth (divided by $4$) and computing time.
Note that using the backslash method and not caring for time, I do find the correct solution. My problem truly lies with the optimization and power of numerical algorithms.
I don't come from the maths / CS / Engineering / Physics field so I am not on top of the literature when it comes to state-of-the-art numerical algorithms. What I would like to ask specifically is the following:
- Given that my matrix $A$ (and de facto $\tilde{A}$) has a known shape, is it possible to create a generic permutation (or pre-conditioning) matrix $P$ at the beginning of my code (even if this has a high computing cost) that I could then use at every iteration? For instance, if that matrix only has $1$'s and $-1$'s it could be applied even if the coefficients in the matrix $A$ are changing.
- Can I leverage other aspects of my problem to gain computing time? For instance, is it possible to parallelize it in a certain way? Use a GPU? I do have access to a computing cluster but I feel that even with more resources I would first need to apply some clever steps to optimize my code (also my current computer has 12 CPU cores, 38 GPU cores and 96GB of RAM).
- What are the most advanced/robust algorithms used to solve high-dimensional PDEs like mine?
I have read a lot on sparse matrix algorithms and iterative algorithms recently but may definitely have missed or mis-understood some of the existing solutions given that I am new to the field. I would more than like to hear your general thoughts and advice if you have dealt with a similar problem.
I code mainly using MATLAB (and Python numpy) so any specific advice would be appreciated.
Some resources I have found useful so far:
- Sparse matrix algorithms by Tim David (although it seems that most of it is already built in MATLAB?)
- Iterative Methods for Sparse Linear Systems by Yousef Saad
- Lecture Notes on Numerical Linear Algebra and Solution of Linear Systems from the MIT
- Lecture Notes on Narrow banded linear systems from the University of Illinois
- Review of GPU-Accelerated Linear Solvers