Expected Hamming Distance to a known string, given n choose k set

Question

Expected Hamming Distance to a known string, given n choose k set

522 Views Asked by Bumbble Comm At 10 May 2026 - 12:32

Is there an analytic way to solve expected hamming distance of two randomly sampled binary (zero or one) strings of length n, where there can only be k ones and n-k zeros (aka n choose k)?

For example, given a 5 digit string 01101 which matches the above requirement of (n-k)=2 zeros, (k)=3 ones, what is the average hamming distance of this string and all other samples drawn from this set? Ultimately I'm using much larger numbers, so simply listing possibilities will not be sufficient.

Thanks in advance.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Yes—we can reconceptualize this problem as the following game:

You have an urn containing $k$ black balls and $n-k$ white balls. You draw $k$ balls out of the urn. Let $a$ be the number of white balls you drew.

The expected Hamming distance between bitstrings is the expected value of $2a$, which is the number of white balls you drew plus the same number of black balls left in the urn.
The correspondence between this game and your problem is this: given a bitstring with $k$ ones and $n-k$ zeroes, you can randomly generate another bitstring with $k$ ones and $n-k$ zeroes. To compute the hamming distance, just look at the $k$ indexes of the new string where the old string has ones. (This amounts to drawing $k$ balls out of the urn to see what the corresponding bits of the new random bitstring are.) If you count $a$ zeroes, then the Hamming distance between the two strings is $2a$: there are $a$ zeroes in the new string where there were ones; correspondingly, those should-have-been $a$ ones in the new string exist where there were zeroes in the old.
To compute the expected Hamming distance, let $a$ be the number of white balls you drew. Each ball drawn has an $(n-k)/n$ chance of being white (suppose be don't look at them when drawing.) Therefore by linearity of expectation, the expected value of $a$ is $k\cdot (n-k)/n = k(n-k)/n$.
Therefore the expected number of differences between two bitstrings is twice this value $(2a)$. The expected Hamming distance is: $$E[H] = \frac{2k(n-k)}{n}$$

You can verify that this formula has the right symmetry and agrees with what you'd expect in extreme cases, e.g. $k=0$, $k=n$, and $k=n/2$.

Indeed, if you'd rather specify the probability $p\equiv k/n$ of a bit being equal to 1, the expected Hamming distance can be written as $$E[H] = 2p(1-p)$$

Expected Hamming Distance to a known string, given n choose k set

There are 1 best solutions below

Related Questions in COMBINATORICS

Related Questions in COMBINATIONS

Trending Questions

Popular # Hahtags

Popular Questions