Given $2^k$ random binary sequences of length $k$, how many are expected to be distinct?

87 Views Asked by At

For example, say you get a uniformly random batch of $256$ sequences of $8$ bits each, and count the number of distinct sequences (i.e. discard duplicates then count), and repeat this experiment indefinitely. In the limit, what's the ratio of distinct sequences to total sequences?

Empirically, irrespective of choice of $k$, this seems to be converging to something around $0.63214$, so as a pure guess, I'm thinking maybe $1-\frac{1}{e}$, but I'm sure someone on here will immediately know the answer to this one.

2

There are 2 best solutions below

0
On BEST ANSWER

Let $$ X_i=\cases{1&if the sequence $\ i\ $ appears amongst the $256$ drawn\\ 0&otherwise .} $$ Then the ratio you're asking for is \begin{align} E\Big(\frac{1}{256}\sum_\limits{i=0}^{255}X_i\Big)&=E\big(X_0\big)\ \ \text{(since all the $\ X_i\ $ are identically distributed)}\\ &=P\big(X_0=1\big)\\ &=1-\Big(1-\frac{1}{256}\Big)^{256}\\ &\approx0.63284\ , \end{align} which differs from $$ 1-\frac{1}{e}\approx0.63212 $$ in the fourth decimal place.

0
On

The number of copies of a given sequence will closely follow a Poisson distribution with mean $1$ because you have $256$ independent tries with probability $\frac 1{256}$. The probability of $0$ events in this distribution is $\frac 1e$, so the expected fraction of strings that are represented is $1-\frac 1e$