Why can unused binary codes be used to make two messages of length 2?

40 Views Asked by At

I was reading Information Theory: A Concise Introduction by Stefan Hollos and J. Richard Hollos. I was somewhat confused by their explanation of the Kraft-McMillian Inequality:

Let $n_1$ be the number of code words of length one. Since we are using a binary code $n_1 \le 2$. The number of unused code words of length one is $2 - n_1$. Each of these can be used to create 2 length two code words so we have $n_2 \le 2(2 - n_1)$ where $n_2$ is the number of length two code words. Likewise for length three code words we have $n_3 \le 2(2(2 - n_1) - n_2)$. We could continue on for longer length code words but let's keep it simple and suppose that the longest code word has length three. Then we can arrange the last equation to get:

$$\frac{n_1}{2} + \frac{n_2}{2^2} + \frac{n_3}{2^3} \le 1$$

This inequality must be satisfied by any prefix code that has a maximum word length of three. The way that this inequality was constructed can easily be generalized to the case where the maximum code length is $M$.

I'm not sure if I'm missing something obvious, but I'm a little confused as to how the unused code words can be used to construct 2 code words each. As I understand it, $n_1$ represents a message of length 1, so we can use it to represent either 0, 1, or 2 possible messages. If we use it to only represent, for example, 1 possible message, we can use the "left over" possible message to represent a message of length 2. I can see where we could represent one message of length 2, but I'm somewhat confused as to how we could represent 1 possible message of length 1 and 2 possible messages of length 2 (since that's 3 possible messages total). Can someone explain this to me?