How many entries are needed to determine a low-rank matrix?

39 Views Asked by At

Suppose $M$ is an $n \times m$ matrix of rank $r \ll n,m$. We know some uniform random sample $S$ of the entries. For example we know each entry with probability $p$. Or we know some uniformly selected $pnm$ of the entries.

I wonder how well do these entries do at determining $M$? By determine I mean $M$ is the only matrix that agrees with the sample over $S$? For example how large does $p$ need to be before $M$ is determined with high probability as $n,m \to \infty?$

Obviously there are examples when only $p \simeq 1$ will do. For example suppose $r=1$ and $M$ has one row $(1,2,3,\ldots, m)$ and all other rows full of zeroes. Then we need to sample all entries of the first row to reconstruct $M$.

However if we also assume all entries of $M$ are nonzero, then it seems sampling every entry is equally useful. To see this write $M = xy^T$ for some vectors $x,y$ of size $n,m$. Then sampling $M^i_j$ gives the relation $x_i= M^i_j y_j$. We need at least $n+m$ entries to get enough relations to solve for $n+m$ unknowns. But it is possible some set of that many relation has redundancies. Getting the right sort of chain leads to the following problem

Suppose $A,B$ are sets with $|A|=n $ and $|B| = m$ and we form a random bipartite graph $G$ on vertex set $V=A \cup B$ by including some of the edges $\{(a,b):a \in A, b \in B\}$. How likely is $G$ to be connected?

I don't know the answer to this question. Does anyone here know?

More generally we have $M=XY$ where $X$ is $n \times r$ and $Y$ is $r \times m$. I believe the correct generalisation of nonzero entries is that all sets of $r$ rows(columns) of $X$, $Y$ are independent. (I am also happy to make any assumptions that holds for some dense open subset.) This lets us take say $X^1 Y_1 = M^1_1 , X^1 Y_2 = M^1_2,\ldots, X^1 Y_r = M^1_r$ and invert the matrix to get $X^1$ in terms of $Y_j$.

This is where I get stuck. I have no idea how to even convert the problem into something on graphs. Any ideas?