In my Statistics class, Huffman coding has just been introduced to us. However, I don't believe I have fully grasped the concept behind hamming. In lecture, we were presented with a question:
The hamming distance between two binary strings is the number of coordinates in which they are different. If X and Y are independent random binary strings of length 4, what is the expected value of the hamming distance between X and Y?
I understand this gives me 16 equally likely strings of length 4 to consider here. From 0000, 0001, 0011... 1110, 1111. However, I'm unaware of how to calculate the expected hamming distance here. Originally, I was mistakening this for entropy. I first tried calculating
$$\sum_{n=1}^{16} \frac1{16}\log_2\frac{1}{\frac{1}{16}} = 4$$
Where $\frac{1}{16}$ represents the probability of being any one of the 16 string lengths. However, I was told this is incorrect. How do I calculate the expected hamming distance?
HINT
Expected Hamming distance is the expected number of positions where the binary strings $X$ and $Y$ are different.
For example, if you are looking at strings of length $2$, you have 4 possible values for each of $X$ and $Y$.
\begin{matrix} X & Y & Same & Different \\ 00 & 00 & 2 & 0 \\ 00 & 01 & 1 & 1 \\ 00 & 10 & 1 & 1 \\ 00 & 11 & 0 & 2 \\ 01 & 00 & 1 & 1 \\ 01 & 01 & 2 & 0 \\ 01 & 10 & 0 & 2 \\ 01 & 11 & 1 & 1 \\ \ldots \end{matrix}
You will have 16 total entries and are looking for the average value in the Different column. After some observation it should be clear each 4 entries will contain $0,1,1,2$ in some permutation, so the overall average is $$ \frac{0+1+1+2}{4} = 1. $$ Hence, one average, in strings of length 2, you get 1 common place. Can you do this for strings of length $4$?