Moore-Penrose psuedoinverse of Laplacian

725 Views Asked by At

I am trying to attain the Moore-Penrose psuedoinverse of a a very large, sparse, rank degenerate, singular, and square matrix. (75000x75000, near rank) I realize the inverse will be very dense. I have been trying some brute force approaches by converting it to a dense matrix (~25GB) on a large memory machine (124GB). Both lstsq (scipy.linalg.pinv) and some SVD approaches (numpy.linalg.pinv) quickly exhaust the memory. The only approach that has worked is a slow but memory efficient SVD (scipy.linalg.pinv2). I am exploring what other techniques are out there that can balance memory efficiency and speed, although I have yet to try scipy.sparse.inv or scipy.sparse.svds for thorough comparison. I don't think any iterative approaches will work well because I can't provide a decent starting point. I think this leaves QR as a prime candidate. I have been exploring some of the methods listed in this paper. I have the following questions:

  1. I want to sample selected entries of the MPI to calculate the distance between a large number of graph nodes in a ad-hoc fashion. Is there a better way to sample select elements of the MPI than persisting the whole matrix?
  2. Are there any algorithms particularly well suited to my matrix type?
  3. Is QR factorization my best bet? Is there some other approaches I should be considering?
  4. If QR, what memory efficient QR implementations exist? Or will I need to hand roll one from a paper?
1

There are 1 best solutions below

0
On

There is something called the large randomized SVD. Facebook released research on it years ago. Do not turn your matrix into a dense matrix. There are libraries that utilize CSR structure. randomized SVD .

We consider a few cases. We consider

Large dense matrices, up to 100,000 × 200,000 in size, Large sparse matrices, up to 10⁷ × 10⁷ in size with O(10⁹) non-zero elements, A comparison to Apache Spark’s distributed SVD implementation.

It is also a function of the degree of sparsity, matrices actually aren't just either sparse or not sparse. It is a percentage. When you look at the function to construct a sparse matrix. It has a parameter called density.

matrix = rand(3, 4, density=0.25, format="csr", random_state=42)