Huge matrix multiplication

5.1k Views Asked by At

I have a sparse A matrix stored in column major order (it is intrisically column major) of ~80GB and another sparse matrix B relatively small (1GB) which can be loaded in row or column major with no particular effort. I need to compute a straight matrix product S = AB. My problem is that I have only 64 GB of RAM (I usually use the Eigen c++ library ) and i need to compute the product by blocks.

I was thinking to re store the A matrix in row major (even it could imply a great increase in terms of storage) and later load the new matrix A by blocks, N rows at time, compute and store the various products and at the end assembly all the blocks together.

Do you have any better ideas?

2

There are 2 best solutions below

3
On

Let A be of size m by n and B be of size n by k. You might consider:

  1. Store B by rows rather than columns.

  2. Break up the A matrix into blocks of size m by n1. Break up B into blocks of size n1 by k.

  3. Multiply the blocks of A times the corresponding blocks of B, and then sum/merge the results to get AB in row major form.

Keep in mind that AB will be of size m by k and might be very dense.

0
On

Optimal matrix representation and multiplication algorithm is highly dependent on sparsity patterns of $A$, $B$ and $S$ matrices.

I assume, that all matrices are stored in the compressed column storage with sorted row indices.

For such huge matrix $A$ it is possible, that computing $A^T$ (i.e. converting the compressed column storage to the compressed row storage) can be too costly. But if possible, then one may compute $S' = B'A'$ simply by splitting $A' = \left[A_1', A_2',\dots, A_k'\right]$ and computing $B'A_i'$ for $i=1,\dots,k$ using the regular sparse-sparse matrix multiplication for matrices in CCS format. This can be faster, than computing $(A')'\times B$ (i.e. converting the matrix $A$ to the CRS storage first, and splitting the resulting matrix by rows), when $S$ is sparse, but slower, when $S$ is nearly dense.

I would consider the following algorithm:

  1. store the matrix $A$ in the CCS format, but split it by rows, i.e. $A=\text{col}(A_1, A_2, \dots, A_k)$.
  2. compute $S_i = A_iB$ by the regular sparse-sparse matrix multiplication algorithm for matrices in the CCS format.
  3. assembly the matrix $S = \text{col}(S_1, S_2, \dots, S_k)$ if required.

If $S_i$ is too large to be stored in RAM, then the matrix $B$ should also be split by columns. This algorithm does not require costly matrix transpositions, required partitioning is quite cheap, IO data transfer is low.