Is there an analytic way to solve expected hamming distance of two randomly sampled binary (zero or one) strings of length n, where there can only be k ones and n-k zeros (aka n choose k)?
For example, given a 5 digit string 01101 which matches the above requirement of (n-k)=2 zeros, (k)=3 ones, what is the average hamming distance of this string and all other samples drawn from this set? Ultimately I'm using much larger numbers, so simply listing possibilities will not be sufficient.
Thanks in advance.
Yes—we can reconceptualize this problem as the following game:
The correspondence between this game and your problem is this: given a bitstring with $k$ ones and $n-k$ zeroes, you can randomly generate another bitstring with $k$ ones and $n-k$ zeroes. To compute the hamming distance, just look at the $k$ indexes of the new string where the old string has ones. (This amounts to drawing $k$ balls out of the urn to see what the corresponding bits of the new random bitstring are.) If you count $a$ zeroes, then the Hamming distance between the two strings is $2a$: there are $a$ zeroes in the new string where there were ones; correspondingly, those should-have-been $a$ ones in the new string exist where there were zeroes in the old.
To compute the expected Hamming distance, let $a$ be the number of white balls you drew. Each ball drawn has an $(n-k)/n$ chance of being white (suppose be don't look at them when drawing.) Therefore by linearity of expectation, the expected value of $a$ is $k\cdot (n-k)/n = k(n-k)/n$.
Therefore the expected number of differences between two bitstrings is twice this value $(2a)$. The expected Hamming distance is: $$E[H] = \frac{2k(n-k)}{n}$$
You can verify that this formula has the right symmetry and agrees with what you'd expect in extreme cases, e.g. $k=0$, $k=n$, and $k=n/2$.
Indeed, if you'd rather specify the probability $p\equiv k/n$ of a bit being equal to 1, the expected Hamming distance can be written as $$E[H] = 2p(1-p)$$