Calculating Expected Hamming Distance

1.2k Views Asked by At

In my Statistics class, Huffman coding has just been introduced to us. However, I don't believe I have fully grasped the concept behind hamming. In lecture, we were presented with a question:

The hamming distance between two binary strings is the number of coordinates in which they are different. If X and Y are independent random binary strings of length 4, what is the expected value of the hamming distance between X and Y?

I understand this gives me 16 equally likely strings of length 4 to consider here. From 0000, 0001, 0011... 1110, 1111. However, I'm unaware of how to calculate the expected hamming distance here. Originally, I was mistakening this for entropy. I first tried calculating

$$\sum_{n=1}^{16} \frac1{16}\log_2\frac{1}{\frac{1}{16}} = 4$$

Where $\frac{1}{16}$ represents the probability of being any one of the 16 string lengths. However, I was told this is incorrect. How do I calculate the expected hamming distance?

1

There are 1 best solutions below

0
On BEST ANSWER

HINT

Expected Hamming distance is the expected number of positions where the binary strings $X$ and $Y$ are different.

For example, if you are looking at strings of length $2$, you have 4 possible values for each of $X$ and $Y$.

\begin{matrix} X & Y & Same & Different \\ 00 & 00 & 2 & 0 \\ 00 & 01 & 1 & 1 \\ 00 & 10 & 1 & 1 \\ 00 & 11 & 0 & 2 \\ 01 & 00 & 1 & 1 \\ 01 & 01 & 2 & 0 \\ 01 & 10 & 0 & 2 \\ 01 & 11 & 1 & 1 \\ \ldots \end{matrix}

You will have 16 total entries and are looking for the average value in the Different column. After some observation it should be clear each 4 entries will contain $0,1,1,2$ in some permutation, so the overall average is $$ \frac{0+1+1+2}{4} = 1. $$ Hence, one average, in strings of length 2, you get 1 common place. Can you do this for strings of length $4$?