Source Words & Huffman Codes

164 Views Asked by At

A source $S$ has source words $w_1, w_2, \ldots, w_n$, with probabilities $p_1 \geq p_2 \geq \ldots \geq p_n > 0$. Let $C$ be a binary Huffman code for $S$, and let $l$ be the length of the longest code word in $C$. Let $L(C)$ be the expected length of $C$.

Suppose you replace $w_n$ by two source words $w_a$ and $w_b$ with positive probabilities $p_a$ and $p_b$ respectively, where $p_a + p_b = p_n$. Call the new Huffman Code $C’$. What is $L(C’) – L(C)$?

So what I understand is that $L(C) = l_1p_1 + l_2p_2 + \ldots + l_np_n$ and $L(C’) = l_1p_1 + l_2p_2 + \ldots + (l_ap_a + l_bp_b)$ So if I were to find the difference $L(C’) – L(C)$ I am thinking that I would simply be left with $l_ap_a + l_bp_b - l_np_n$, but I have a feeling there is more to this than I presume.

Any ideas or suggestions would be greatly helpful!

1

There are 1 best solutions below

0
On

One big assumption that you are making is that nothing else changes, which is not generally true. But in your case it is true. Consider the algorithm that calculates an optimal code for your problem. If you run this algorithm for $S'$ at first step it merges the two lowest probabilities $p_a$ and $p_b$ and creates $p_n$. The rest of algorithm will run exactly like the source $S$.

One more observation that we should make is that the two new symbols will both have the same length (because they were merged together). and this length is exactly one more than $l_n$ (because algorithm runs the same way after that merge).

Now calculating what you already have: $l_a \times p_a + l_b \times p_b - l_n \times p_n = l_a \times (p_a + p_b) - l_n \times p_n = l_a \times p_n - l_n \times p_n = (l_a - l_n) \times p_n = 1 \times p_n = p_n $