What is the expectation of the rank of a matrix with a 1 at each column?

492 Views Asked by At

Say a random square matrix $A\in\mathbb{R}^{n\times n}$, each column of $A$ has exactly one nonzero element being 1, i.e. each column looks like $e_i=\{0,\dots,1,\dots,0\}^\top$. Say for each column, the position of 1 is i.i.d. draw from the uniform distribution over $\{1,2,\dots,n\}$. What will the distribution of $rank(A)$ look like? What is its expectation? Will it concentrate near its expectation?

2

There are 2 best solutions below

0
On

The rows of $A$ are pairwise orthogonal, so the rank of $A$ is the number of nonzero rows. Each row is nonzero with probability $1-(1-1/n)^n$, so the expected rank is $n(1-(1-1/n)^n)\approx n(1-1/e)\approx0.632\times n$ for large $n$. For small $n$, we have the values $1,\frac32,\frac{19}{9},\frac{175}{64},\ldots$.

(Besides the expectation, I'm not sure what the rest of the distribution looks like.)

0
On

The rank is the number of rows containing at least one $1$. We will use the method of indicator variables to find the mean and variance of the rank. It turns out that the distribution does concentrate near the mean.

Let $$I_i = \begin{cases} 1 \qquad \text{if row } i \text{ contains at least one 1} \\ 0 \qquad \text{otherwise} \end{cases}$$ for $i = 1,2,3, \dots ,n$, and let $$X = \sum_{i=1}^n I_i$$ so $X$ is the rank of the matrix. We have $$P(I_i = 0) = \left( \frac{n-1}{n} \right)^n = (1-1/n)^n$$ so $P(I_i = 1) = 1- (1-1/n)^n$, and the mean of $X$ is $$\mu = E(X) = E \left( \sum_{i=0}^n I_i \right) = \sum_{i=0}^n E( I_i) = n [ 1- (1-1/n)^n]$$ We know $(1-1/n)^n \to e^{-1}$ as $n \to \infty$, so $\mu \sim (1-e^{-1})n$.

To find the variance of $X$, we need to first find $E(\sum_{i<j} I_i I_j)$. So notice that $$\begin{align} P(I_i I_j = 0) &= P(I_i = 0) + P(I_j = 0) - P((I_i = 0) \cap (I_j=0)) \\ &= 2 (1-1/n)^n - (1-2/n)^n \end{align}$$ for $i < j$, so $P(I_i I_j = 1) = 1 -2 (1-1/n)^n + (1-2/n)^n$. Therefore $$E \left( \sum_{i<j} I_i I_j \right) = \sum_{i<j} E(I_i I_j) = \frac{n(n-1)}{2} [ 1 -2 (1-1/n)^n + (1-2/n)^n ] $$ Because of the identity $$\left( \sum_{i=0}^n I_i \right )^2 = \sum_{i=0}^n I_i^2 + 2 \sum_{i<j}I_i I_j = \sum_{i=0}^n I_i + 2 \sum_{i<j}I_i I_j$$ we have $$X^2 = X + 2 \sum_{i<j}I_i I_j$$ so $$\begin{align} E(X^2) &= E(X) + 2 \sum_{i<j} E(I_i I_j) \\ &= n [ 1- (1-1/n)^n] + n(n-1) [ 1 -2 (1-1/n)^n + (1-2/n)^n ] \end{align}$$ So the variance of $X$ is $$\begin{align} \sigma^2 &= E(X^2) - (E(X))^2 \\ &= n(1-1/n)^n - n^2(1-1/n)^{2n} + n(n-1)(1-2/n)^n \end{align}$$ after some simplification. It turns out (see below) that $\sigma^2 \sim (e^{-1} - 2 e^{-2}) n$, so $$\frac{\sigma}{\mu} \sim \frac{\sqrt{e^{-1}-2 e^{-2}}}{1-e^{-1}} \frac{1}{\sqrt{n}}$$ and the distribution does concentrate near the mean for large $n$.


The asymptotic behavior of $\sigma^2$ for large $n$:

From above, we have $$\begin{align} \frac{\sigma^2}{n} &= (1-1/n)^n - n (1-1/n)^{2n} + (n-1)(1-2/n)^n \\ &= (1-1/n)^n - (1-2/n)^n+ n[(1-2/n)^n-(1-1/n)^{2n}] \end{align}$$ so $$\lim_{n \to \infty} \frac{\sigma^2}{n} = e^{-1} - e^{-2} + \lim_{n \to \infty} n[(1-2/n)^n -(1-1/n)^{2n}] \tag{*}$$ To evaluate the last limit above, notice that $$\lim_{n \to \infty} n[(1-2/n)^n -(1-1/n)^{2n}] = \lim_{x \to 0} \frac{1}{x} [(1-2x)^{1/x} - (1-x)^{2/x}] $$ and $$\begin{align} (1-2x)^{1/x} &= \exp( (1/x) \ln(1-2x)) \\ &= \exp((1/x)(-2x -2x^2 +o(x^3)) \\ &= \exp(-2 -2x +o(x^2)) \\ &= e^{-2} \exp(-2x +o(x^2)) \\ &= e^{-2} (1-2x) + o(x^2) \end{align}$$ Similarly, $$\begin{align} (1-x)^{2/x} &= \exp((2/x) \ln(1-x) \\ &= \exp((2/x) (-x -(1/2)x^2 +o(x^3)) \\ &= \exp(-2-x+o(x^2)) \\ &= e^{-2} \exp(-x +o(x^2)) \\ &= e^{-2} (1-x) + o(x^2) \end{align}$$ So $$\begin{align} \lim_{x \to 0} \frac{1}{x} [(1-2x)^{1/x} - (1-x)^{2/x}] &= \lim_{x \to 0} \frac{1}{x} [e^{-2}(1-2x) +o(x^2) - e^{-2}(1-x) + o(x^2)] \\ &= \lim_{x \to 0} (-e^{-2} + o(x)) \\ &= -e^{-2} \end{align}$$ Substituting into (*), we have $$\lim_{n \to \infty} \frac{\sigma^2}{n} = e^{-1} - e^{-2} - e^{-2} = e^{-1} -2 e^{-2}$$ so $\sigma^2 \sim (e^{-1} - 2 e^{-2}) n$, as was claimed above.