Huge matrix multiplication

Question

Huge matrix multiplication

5.1k Views Asked by Bumbble Comm At 11 May 2026 - 9:26

I have a sparse A matrix stored in column major order (it is intrisically column major) of ~80GB and another sparse matrix B relatively small (1GB) which can be loaded in row or column major with no particular effort. I need to compute a straight matrix product S = AB. My problem is that I have only 64 GB of RAM (I usually use the Eigen c++ library ) and i need to compute the product by blocks.

I was thinking to re store the A matrix in row major (even it could imply a great increase in terms of storage) and later load the new matrix A by blocks, N rows at time, compute and store the various products and at the end assembly all the blocks together.

Do you have any better ideas?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2017-10-25 18:41:42

Let A be of size m by n and B be of size n by k. You might consider:

Store B by rows rather than columns.
Break up the A matrix into blocks of size m by n1. Break up B into blocks of size n1 by k.
Multiply the blocks of A times the corresponding blocks of B, and then sum/merge the results to get AB in row major form.

Keep in mind that AB will be of size m by k and might be very dense.

**user52976** · Answer 2 · 2017-10-25 23:32:51

Optimal matrix representation and multiplication algorithm is highly dependent on sparsity patterns of $A$, $B$ and $S$ matrices.

I assume, that all matrices are stored in the compressed column storage with sorted row indices.

For such huge matrix $A$ it is possible, that computing $A^T$ (i.e. converting the compressed column storage to the compressed row storage) can be too costly. But if possible, then one may compute $S' = B'A'$ simply by splitting $A' = \left[A_1', A_2',\dots, A_k'\right]$ and computing $B'A_i'$ for $i=1,\dots,k$ using the regular sparse-sparse matrix multiplication for matrices in CCS format. This can be faster, than computing $(A')'\times B$ (i.e. converting the matrix $A$ to the CRS storage first, and splitting the resulting matrix by rows), when $S$ is sparse, but slower, when $S$ is nearly dense.

I would consider the following algorithm:

store the matrix $A$ in the CCS format, but split it by rows, i.e. $A=\text{col}(A_1, A_2, \dots, A_k)$.
compute $S_i = A_iB$ by the regular sparse-sparse matrix multiplication algorithm for matrices in the CCS format.
assembly the matrix $S = \text{col}(S_1, S_2, \dots, S_k)$ if required.

If $S_i$ is too large to be stored in RAM, then the matrix $B$ should also be split by columns. This algorithm does not require costly matrix transpositions, required partitioning is quite cheap, IO data transfer is low.

Huge matrix multiplication

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRIX-CALCULUS

Related Questions in MATRIX-DECOMPOSITION

Related Questions in SPARSE-MATRICES

Trending Questions

Popular # Hahtags

Popular Questions