Huffman Encoding Proof Probability and Length

1.7k Views Asked by At

If the frequency of symbol i is strictly larger than the frequency of symbol j, then the length of the codeword for symbol i is less than or equal to the length of the codeword for symbol j.

I attempted to solve this by using Induction.

Basis Step: When we have two symbols, A with prob 0.6, B with prob 0.4, then we can assign A with encoding 0 and B with the encoding 1. This is clear.

Induction Step: Assume that it works correctly for k symbols.

How to show it works for k+1?

Is using induction the right thing to do here?

2

There are 2 best solutions below

0
On BEST ANSWER

If you already know the theorem about the optimality of the Huffman code, then by all means look at Batman’s answer. If not, however, it may be that you’re intended to prove the result directly from the mechanics of the algorithm for constructing the Huffman code. That argument is a bit messy; I’ll sketch it and leave you to fill in quite a few details.

The argument is by induction on the number of symbols. Suppose that $k\ge 3$, and as induction hypothesis assume the result for fewer than $k$ symbols. Now consider a Huffman tree for $k$ symbols, and suppose that there are symbols $i$ and $j$ such that $f_i>f_j$, where $f_\sigma$ is the frequency of $\sigma$, and $L(i)>L(j)$, where $L(\sigma)$ is the length of the code for $\sigma$. Note that $L(\sigma)$ is also the depth of node $\sigma$ in the tree.

Find the youngest (lowest) common ancestor of nodes $i$ and $j$; call that node $v$. If $v$ is not the root of the tree, show that the subtree rooted at $v$ is a Huffman tree for all of the symbols descended from it, including $i$ and $j$; there are fewer than $k$ of these, so you can apply the induction hypothesis to conclude that $L(i)\le L(j)$, thereby ruling out this case.

The the other possibility is that $v$ is the root of the tree, so that the codes for $i$ and $j$ differ in the first bit. Let node $u_1$ be the parent of node $i$ and node $w_1$ the parent of node $j$. Show that the algorithm for constructing the Huffman tree ensures that $f_{u_1}$, the total frequency of all symbols in the subtree rooted at $u_1$, is greater than $f_{w_1}$. Moreover, $L(u_1)>L(w_1)$: node $u_1$ is deeper in the tree than node $w_1$, so nodes $u_1$ and $w_1$ are related exactly as nodes $i$ and $j$ were, but one level higher in each case. Thus, you can repeat the argument: let node $u_2$ be the parent of node $u_1$, and node $w_2$ the parent of node $w_1$, and conclude that $f_{u_2}>f_{w_2}$ and $L(u_2)>L(w_2)$. After finitely many steps you’ll reach nodes $u_n$ and $w_n$ such that $w_n$ is the root of the tree, and $u_n$ isn’t, and that leads immediately to an easy contradiction.

2
On

You don't need induction -- if symbol $i$ has higher probability than symbol $j$ but $i$ longer codeword length than $j$, one can decrease the average codeword length by swapping the codewords for $i$ and $j$ contradicting optimality of Huffman codes.

More specifically, the difference average code length in the original code (with $i$ having a longer codeword than $j$) versus the code with $i$ and $j$'s codewords swapped will be $p_i \ell_i + p_j \ell_j - (p_i \ell_j + p_j \ell_i)>0$ [so the original code is longer than the swapped code].

You can see Chapter 5 of Cover and Thomas' Elements of Information Theory 2e for more characterizations of the structure of a Huffman code (e.g. longest codewords have the same length and differ only in the last bit), or Variations on a Theme of Huffman by Gallager (IEEE Trans. IT).