I want to solve a linear system of equations $Ax=b$, where $A,x,b$ are all complex.
The $n\times n$ matrix $A$ is dense, and does not have any special structure: It is not symmetric, not real, not diagonally dominant.
Up till now the matrix size has been modest, $n=30 000$, this still fits in computer memory. And I've been very successful using LU decomposition (lapack) to solve it. If the system size is increased even further I can use scalapack (distributed memory) to store the matrix in memory on a cluster. However, the problem does not scale well. Memory requirements are of $O(n^2)$, and ideally I'd like to go up to about $n=3\cdot 10^6$. This would require 144 TB, which is even too much for to store on hard disks.
The computation of the matrix is not particularly cheap, but can be done 'on demand', so it could be possible to use an iterative method to solve this without actually storing the matrix. I looked into some of these methods, but they are not robust enough.
For instance Gauss-Seidel or Jacobi does not require having the full matrix at all times, but it does not converge for my case. Are there any other good iterative approaches to solve this?
Perhaps the matrix $A$ can be preconditioned to improve the diagonal dominance, this however must not require storage of another big matrix, as that would not fit in memory/disk either.