time complexity for sparse matrix multiplication $XX^T$

3.9k Views Asked by At

I have an matrix $X\in R^{m\times n}$ and the matrix is very sparse. Assume the $nnz(X) = D$, which means the number of non-zero elements is $D$, then what is the time complexity to compute

$$A = XX^T$$

Can it be faster than $O(nm^2)$ ? such as linear complexity

2

There are 2 best solutions below

2
On BEST ANSWER

Short answer, the operation can be at least as good as $O(m D)$.

Long answer: This all depends on the sparse matrix format. There are three big ones: Compressed sparse column (CSC) format, compressed sparse row (CSR) format, and triplet format.

In compressed sparse row format, for each row $i$, you store a list of column indices $J_i$ and values $V_i$, such that if $X_{ik}$ is the $d$th nonzero in row $i$, then $k = J_i[d]$ and $X_{ik} = V_i[d]$. Then what you actually store is the concatenated lists $J = [J_1,J_2,...,J_m]$ and $V = [V_1,V_2,...,V_m]$, and $I = [0, |J_1|, |J_2|,...]$ tells you where the offsets of the list headers are. In compressed sparse column format, the same thing happens with row and column switched. In triplet form, you just store $I$, $J$, and $V$ the row index, column index, and value of each nonzero sequentially (e.g. by concatenating the columns.) The complexity mostly comes from looping through these lists, and you pick the format that's fastest for your operation.

I believe in your operation, the CSR format will be the best of the three. (I am not familiar if there are even better schemes out there... probably there are!)

Then the code should look something like this:

For row_i = 1,...,m % loop through rows of A
    For col_j = 1,..., row_i % loop through columns of A
        A[i,j] = 0
        if i != j: A[j,i] = 0
        %find which columns of $k$ in which both $X_{ik}$ and $X_{jk}$ are nonzero
        J_i = J[I[row_i-1]:I[row_i]]
        J_j = J[I[col_j-1]:I[col_j]]

        % also extract the values
        V_i = V[I[row_i-1]:I[row_i]]
        V_j = V[I[col_j-1]:I[col_j]]

        % get the values J_both in the intersection of J_i and J_j, and idx_i (idx_j) the corresponding list indices
        [idx_i, idx_j, J_both] = intersect(J_i,J_j) 
        for k = 1,..., length(J_both)
            A[i,j] = A[i,j] + V_i[idx_i[k]]*V_j[idx_j[k]]
            A[j,i] = A[j,i] + V_i[idx_i[k]]*V_j[idx_j[k]]

If you count each single operation as constant time, I believe the computational complexity should be like $O(m\sum_{i=1}^m |V_i|) = O(mD)$.

A valid question to ask is, is it possible that $|I_j\cap I_i| \ll \max{|I_i|, |I_j|}$ most of the time, and if the analysis could be tightened in that scenario? (e.g. the matrix $X$ is such that most of the row nonzero indices don't overlap, for example in a coding matrix.) Here, maybe, a different matrix format might be even better!

0
On

Note that the algorithm in the accepted answer has terrible real world performance. Complexity != performance. Also I'm not sure about the analysis in that answer.

The fastest way known to multiply two sparse matrices $A$ and $B$ is by converting both to CSR or CSC format. Have a look at scipy's implementation for multiplying two CSR matrices. Multiplying two CSC matrices uses the same implementation together with the identity $(AB)^T=B^TA^T$.

When both matrices are in the same format, e.g. in CSR, you can use the column indices in $A$ to index fast into the row pointer array of $B$, which then point to the column indices in $B$. Basically all non-zero elements can be found directly by using values found in one array to index into another.

When you have opposite formats, you can not do this and for each element in the output matrix you must match corresponding elements in two lists representing the indexes of columns in $A$ and indexes of rows in $B$ (which is what the intersection is doing in the algorithm above). This is very slow.

It's a bit unfortunate since the transpose of $A$ in CSR format is simply the same matrix in CSC format. Also $AA^T$ could benefit from the fact that it is a symmetric matrix, so we could avoid doing the same computation for the triangles above and below the diagonal. But in order to effectively multiply $A$ with $A^T$ we have to convert $A^T$ from CSC to CSR (relatively fast), which effectively converts it to a completely different looking matrix.

Also, in the worst case sparse matrix multiplication remains $O(nm^2)$ complexity because there is no guarantee that the resulting matrix will also be sparse. Consider the matrix

\begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ 1 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0\\ \end{bmatrix}

then $AA^T$ will be a dense matrix with all elements being 1 except for the first element. All those elements need to be computed so you have to at least loop over rows of A twice to calculate each element, yielding complexity of $O(m^2x)$ with $x$ the complexity of computing a single element. Supposing we follow the algorithm from the accepted answer, then multiplying a row and a column requires at least a single loop over all the elements (see e.g. this answer for finding matching entries in two arrays). This means that in the worst case when an entire row in $A$ and an entire column in $A^T$ are filled then $x=n$. If we have a random distribution of non-zero elements, then we expect the number of non-zero elements in each row to scale in proportion to $n$, so in the average case we would have $x=kn$ with $k<<1$. But in complexity analysis constant factors are thrown out so we still get $O(nm^2)$ complexity.