Test symmetricity for a sparse matrix

630 Views Asked by At

I have a sparse matrix in LIL (List of Lists) format. I want to test whether the matrix is symmetric or not. Let's say I have n non-default elements. What's the best complexity in terms or space and time I can achieve?

The best solution I have came up with so far is to traverse the matrix, looking at each non-default element (consisting of row, col and value) and storing that element in a hash-table and then checking whether two elements Element$(i, j, v_1)$ and $(j, i, v_2)$ have the same value or not. The time complexity for this is $\mathcal{O}(n)$ and space complexity is also $\mathcal{O}(n)$. It looks to me no scope of improvement in time complexity but I wonder whether space complexity could be improved or not without compromising the time complexity.

2

There are 2 best solutions below

0
On

For sparse matrices we often uses either compressed sparse row format (CSR) or compressed sparse column format (CSC).

In CSR format there are three array, XADJ[n+1], ADJ[nnz] and VAL[nnz]. Information about the nonzero entries on the ith row is stored in at in a contiguous chunk of ADJ and VAL starting at index XADJ[i]. Specifically, ADJ[k] is the column index of the kth nonzero entry of the matrix and VAL[k] is the value of the kth nonzero. Normally, XADJ[n+1]=nnz stores the number of non-zero entries of the matrix. In this manner, CSR format allows convenient access to the rows of matrix. Similar, CSC format allows convenient access to the columns of the matrix

In your case, you can benefit from using a hybrid format. Specifically use CSR format for the strictly upper triangular part of the matrix and CSC format for the strictly lower triangular format, storing the diagonal entries in a separate array. This will allow you to determine that the matrix is symmetric in time proportional to the number of non-zeros and you can do the comparison in place with no extra storage.

You will have to adapt your other routines (sparse matrix vector multiply and LU factorization) to this format and that extra work may not be worth the effort.

Regardless, if you are using pointers and dynamic memory allocation to implement your linked list, then you stand to gain tremendously by switching to the more compact representation afforded by either CSR or CSC.

I hope this helps.

7
On

Just an idea Maybe you can try and take a few random vectors, transpose and multiply from left and right separately and compare the resulting vectors. The probability that they become the same by chance for any non-symmetric matrix should diminish quite fast - but they must be exactly the same in all elements including precision if matrix is symmetric. Of course you will be 100% sure (up to numerical precision) after having tried $m$ (given $m \times m$ matrix) of them if that is necessary.


EDIT To expand a little, I did a small script randomizing a double precision sparse symmetric $m = 50$, with $n=200$, probability that an any one element gets altered ( destroying symmetricity ) is $1\%$ for each element.

Out of 1024 such randomized matrices and single random vectors such we successfully found $667$ which were correctly non-symmetric and $8$ which missed. We would expect $1- 0.99^{100} \approx 0.63397$ which is close to $\frac{667}{1024} \approx 0.63672$.


NEW EDIT I found that the 8 failed ones above occured when accidentally altering a diagonal element, now it consistently finds all non-symmetric matrices $100\%$ on the first vector try.


Analysis of complexity

The memory complexity will be very nice as you can compute each element at a time and compare pairwise and just store a boolean "have we differed yet?". Computational complexity for one core is $\mathcal{O}(m)$, but for k cores it will be $\mathcal{O}\left(\frac{m}{k}\right)$ as it is massively parallellizable, just let each core compute $m/k$ element pairs. Memory requirements is need to store two element, which should easily fit into registers of modern cpus. Then "or"-ing all the cores' booleans tells us if one (or more) have differed yet and then we can stop execution. No need for locking memory as we can only go from false to true and not the other way. So we are done when each thread finishes or as fast as the first thread finds a difference.


I have mostly worked in applied sciences with floating point matrices and not considered integers or finite fields of integers, so maybe that is a different story.