Probability of any k positions matching across a set of n random binary strings of length L?

432 Views Asked by At

What's the probability of the same k positions matching for ALL strings in the set of n random binary strings of length L? Note that the probability for 0 occurring at any position for any string is 1/2 and the probability for 1 is 1/2.

This is a possible event that has to be included in the probability: 00100, 01110, 01100, 01010 In this example only the first and last positions match for all the strings. This example would be one possible event of many considered for the probability that the same k=2 positions match for all strings in the set of n=4 random binary strings of length L=5.

I'm trying to figure out the generalizable probability formula so I can scale it up.

I can do the problem when there are just 2 strings, but once I have more the combinations confuse me. The concept of Hamming distance seems relevant but it only considers the similarities in the same position of 2 strings instead of considering the same position across the set of strings.

This question seems very similar but it seems to only care about position similarities of any pair of sequences in the set instead of the position similarities shared for every string in the set. Probability that a set of 'N' random binary strings are all at least a certain Hamming distance 'k' apart

1

There are 1 best solutions below

5
On BEST ANSWER

The probability that all $n$ strings match at a particular position is $(1/2)^{n-1}$. Therefore, the number of positions where they all match is distributed like $\operatorname{Bin}(L,(1/2)^{n-1})$, so the probability there are exactly $k$ such positions is $$ \binom{L}{k}((1/2)^{n-1})^k(1-(1/2)^{n-1})^{L-k}. $$ If you want the probability that there are at least $k$ positions where all strings match, then the answer would be $$ \sum_{i=k}^L\binom{L}{i}((1/2)^{n-1})^i(1-(1/2)^{n-1})^{L-i}. $$