Large invertible submatrix in random sparse matrices

367 Views Asked by At

The Problem

Informal statement of the problem: consider a natural distribution $D_n$ of very sparse $n\times n$ matrices over the binary field $\mathbb{F}_2$ - that is, a matrix sampled from $D_n$ contains a constant number of $1$ per row. How likely is a matrix sampled from $D_n$ to contain a large invertible submatrix?

More formally, consider the two "natural distributions" over $n\times n$ matrices in $\mathsf{M}_{n\times n}(\mathbb{F}_2)$:

  • $D^0_n$ samples each entry of the matrix independently from the Bernouilli distribution with probability $p_n = d/n$, where $d$ is a small constant (that is, each entry of a sampled matrix is $1$ with probability $d/n$, and $0$ with probability $1-d/n$).
  • $D^1_n$ samples each row of the matrix independently; to sample the $i$th row, pick a random size-$d$ subset $S_i$ of $[1,\cdots, n]$. The subset $S_i$ denotes the position of the $1$s in the $i$th row (and $[1,\cdots, n]\setminus S_i$ denotes the position of the $0$s).

Then, consider the following conjecture (with $b=0$ and $b=1$ giving two variants depending on the choice of the distribution): there exists constants $\gamma<1, N$ such that for any $n > N$, the probability that a random matrix sampled from $D^b_n$ contains an invertible submatrix of size $\gamma n \times \gamma n$ is at least $1-f(n)$. Here, $f(n)$ is some fixed function that goes to $0$ when $n$ grows, e.g. an inverse polynomial, or an inverse exponential.

Question

What is known about the above conjecture? Is it known to hold for one of $D^0_n,D^1_n$? Alternatively, is it known to hold if we relax the requirement to containing an invertible submatrix of size $\gamma n \times \gamma n$ with constant probability (instead of a probability that goes to $1$ when $n$ increases)?

I'm specifically interested in the case $d = 5$ (that is, the sparse matrices have on average five $1$s per row), in case this simplifies the problem in any way. I would also be happy with any partial answer, giving pointers to relevant literature, relating the problem to existing problems, or providing any clues.

Thoughts on the question

I have not found any literature directly addressing the problem above. There is, however, a rather large body of research on the rank of random less-sparse matrices, namely, $n\times n$ matrices containing an average of $O(\log n)$ $1$s per row (as opposed to constant in my scenario).

More specifically, this paper shows that the rank of random (Bernouilli) sparse matrices will be $n - O(1)$ with probability $1-o(1)$, where the Bernouilli probability is $p_n = c\log n/n$ for some constant $c>0$. A similar result for any field is proven in this other paper. However, it is not clear to me whether these results could be extended to guarantee a rank $\gamma\cdot n$ for some constant $\gamma$, with probability $1-o(1)$, in the setting $p_n = d/n$ for some small constant $d$. Furthermore, I do not know either if these results can be extended to the stronger statement that a random sparse matrix has a large invertible subsystem (as opposed to a large rank).

In another paper (which I could not find back), there was a more fine-grained analysis of the rank of a random $O(\log n)$-sparse matrix over $\mathbb{F}_2$. Unfortunately, replacing $O(\log n)$ by a constant in their calculations only leads to the statement that the number of dependencies in a random sparse matrix is at most $O(n)$ (with probability $1-o(1)$); it does not allow (as far as I could see) to prove the stronger statement that the number of dependencies will be upper bounded by $(1-\gamma)\cdot n$ for some constant $1>\gamma >0$. And this is still only about the rank anyway.

Note

I've not verified this specific conjecture experimentally, but I've done so (well, some coauthors of mine have done so) for a more complex distribution over sparse matrices (for the context, it arose when analyzing the success probability of a subexponential-time attack on a cryptographic pseudorandom generator), and it seemed to be verified (with a constant $\gamma$ above $0.9$), for large values of $n$ (a few hundredth). The sampled sparse matrix always contained a $\gamma n\times \gamma n$ invertible submatrix in our experiments; the same seems very likely to hold also for the simpler distributions I consider in this question.

2

There are 2 best solutions below

1
On BEST ANSWER

If all you want is to show that there is an invertible matrix whose size is a positive fraction of the matrix, one quick way to do it is to look at columns containing exactly one nonzero entry. These will with high probability form a constant fraction ($~de^{-d}$) of columns, and with high probability these entries will be scattered over a positive fraction of rows, giving you a permutation submatrix of size $\gamma n$. But there are some stronger results known.

For symmetric matrices, the rank is well understood in your first model thanks to the work of Bauer and Golinelli for $d<e$ and of Bordenave, Lelarge, and Salez for $d>e$. The idea is to think of the matrix as the adjacency matrix of a graph and make the following observation (a variant on the "quick" argument above): If a vertex has exactly one neighbor in the graph (exactly one non-zero entry in the corresponding column of the matrix), deleting both the vertex and its neighbor from the graph reduces the rank of the underlying matrix by exactly $2$ regardless of the field. If $d<e$, you can continue this process until only $o(n)$ non-zero rows remain. so the rank is asymptotically twice the number of removed pairs of vertices. Bordenave, Lelarge, and Salez showed that if $d>e$ the remaining "core" matrix with all rows/columns having at least two non-zero entries is asymptotically full rank (albeit over $\mathbb{R}$ instead of $\mathbb{Z}_2$). I'm not sure if there have been corresponding studies in the non-symmetric case.

Your second model has been well-studied in computer science under the name of "Random XOR-SAT". Pittel and Sorkin (building off earlier work of Dubois and Mandler for the case $d=2$) gave a precise threshold value of $c$ such that a random $cn \times n$ matrix contains an $n \times n$ invertible matrix with probability $1$ (based on the behavior of the same "remove columns with exactly one nonzero entry" peeling process as above). This would imply a lower bound of $\gamma = \frac{1}{c}$ on your problem (just delete everything but the first $\gamma n$ rows) which isn't sharp. I wouldn't be surprised if the actual answer was whatever resulted from performing some analogue of the peeling process on $n \times n$ matrices.

0
On

When reading your question I was very skeptical, but after a few computational tries, I realized that I was wrong. Here are some ideas

Take $n=200,d=3$. Consider a random matrix $A=[a_{i,j}]$; on average, it has $10$ zero rows. Then, with a good probability $rank(A)\leq 190$. Then we consider the submatrices $B$ of $A$ of dimension $0.9n=180$.

If the $\det$ of such a $B$ is $1$, then necessarily, there exists a permutation $\pi$ on a subset of $1,\cdots,200$ of length $180$ s.t. $\Pi_ia_{i,\pi(i)}=1$. To find (if there exist) an estimate of the number of these permutations is a tree problem.

When $n=180$, they exist almost all the time. At each admissible permutation, we can associate a matrix $B$; roughly speaking, for each $B$, $prob(\det(B)=1)\approx 1/2$. Then, if we obtain $k$ such matrices $B$, then the prob. that at least one is invertible is $\approx 1-1/2^k$. In fact, here $prob(rank(A))\geq 180$ is $\approx 0.89$.