Expected sum of Hamming distances in set of random strings

467 Views Asked by At

Thee Hamming distance $H(S_1, S_2)$ between two binary strings $S_1, S_2$ of length $n$ is the number of positions on which the two strings disagree. It is straightforward to show that if $S_1, S_2$ are two random strings of length $n$, the expected Hamming distance $\mathbb{E}[H(S_1, S_2)]$ is $n/2$.

Now, say I'd like to have a set of $k$ strings $S =\{S_1, \ldots, S_k\}$ such that for every $S_i \in S$, the average distance between $S_i$ and the other strings of $S$ is at least $n/2$. In other words, for every $S_i \in S$, $\sum_{j \neq i} H(S_i, S_j) \geq (k - 1)n/2$.

Actually, I'm not really interested in constructing $S$. Rather, I'm interested in whether the following argumentation, which shows that $S$ exists, is valid - for it seems to me that there is a hole to fill. This came up during an informal discussion on math stuff, and I couldn't refute the argument on the spot. Here's the argument.

We show that such an $S$ exists by picking $S_1, \ldots, S_k$ randomly (uniformly and independently). Then for any $S_i \in S$, the exected sum of distances $\mathbb{E}[\sum_{i \neq j} H(S_i, S_j)] = (k - 1) \mathbb{E}[H(S_i, S_j)] = (k - 1)n/2$, by linearity of expectation. Thus there is a set of strings $S$ such that $\sum_{i \neq j}H(S_i, S_j) \geq (k - 1)n/k$. Since this holds for every $S_i \in S$, we deduce that there exists a set of strings $S$ such that every string $S_i \in S$ has $\sum_{i \neq j}H(S_i, S_j) \geq (k - 1)n/2$

The part that's bugging me is the end, which basically says "since this holds for every $S_i$, then there is a $S$ in which every $S_i$ has the desired sum". The argument shows that for any $S_i$, there is a $S$ having the desired sum. But does this show that there is a single $S$ satisfying every $S_i$ simultaneously?

1

There are 1 best solutions below

3
On BEST ANSWER

You're right, it doesn't. The argument merely shows that there are $k$ possibly different sets that each have the desired property for one of the $S_i$.