Expected number of pairs with Hamming distance $d$ for a sample of $k$ random bit strings of length $n$

134 Views Asked by At

Say we were to uniformly sample $k$ times from a bit string with length $n$. What is the expected number of pairs with a Hamming distance $d$? In the limit of Hamming distance 0, I realize this becomes a birthday collision problem, but I am interested in the general case.

For example, say we generate 1000 random 10-bit strings, what is the expected number of strings with a Hamming distance of 3?

I found several related posts: Probability that a set of 'N' random binary strings are all at least a certain Hamming distance 'k' apart , birthday problem near misses , and Average number of strings with edit distance exactly 2 but I could not quite make the additional leap to answer my similar but not exactly the same question.

Thanks in advance for your help!

2

There are 2 best solutions below

1
On BEST ANSWER

Actually this seems quite easy

Let $A^d_{i,j}$ be the event that two strings $s_i$ $s_j$ have distance $d$. Then, if both $s_i,s_j$ are random, we have $$P(A^d_{i,j})= \binom{n}{d} 2^{-n}$$

Then, if we sample $k$ random strings, the expected number of pairs with such distance is

$$ \frac{k(k-1)}{2} \binom{n}{d} 2^{-n}$$

0
On

Just adding extra clarity for myself here based on leonbloy's answer:

The probability that a pair of strings has Hamming distance $d$ follows a binomial distribution: $$ P(X=d)=\binom{n}{d}(1/2)^d (1/2)^{n-d} = \frac{1}{2^n}\binom{n}{d} $$

as explained by Probability that hamming distance is d.

Then, this probability is multiplied by the total number of pairings that can be made in a set of $k$ items. This can be found from the series: $(k-1) + (k-2) + ... (k-(k-1))$ which reduces to $\frac{k(k-1)}{2}$