How many bits does the longest encoded symbol have, using an Huffman coding on a given probability distribution?

887 Views Asked by At

Having an alphabet made of 1024 symbols, we know that the rarest symbol has a probability of occurrence equal to 10^(-6). Now we want to code all the symbols with Huffman Coding. How many bits will the longest encoded symbol have? How many encoded symbols will have same length?

I try to imagine a Huffman tree for it, but it does not appear to make much sense, as trees for Huffman encoding aren't always balanced (so I can't use tree's height to calculate this I guess). I know that at least two least-probable symbols will have longest codings, but how to relate element's probability with its Huffman code length? Or maybe there is some other way to calculate it?

1

There are 1 best solutions below

3
On BEST ANSWER

I don't think there's enough information to answer the question. Here's a simplified pair of examples, with just 4 symbols $a,b,c,d$:

  1. The symbols have probabilities $\frac6{34},\frac6{34},\frac{11}{34},\frac{11}{34}$. In this case the tree will be $((a b) (c d))$, with maximum code length 2 (and 4 symbols having that length).
  2. The symbols have probabilities $\frac6{34},\frac6{34},\frac9{34},\frac{13}{34}$. In this case the tree will be $(((a b) c) d)$, with maximum code length 3 (and 2 symbols having that length).

The minimum probability is the same in these two examples, but the maximum code length is not. (I imagine a similar example can be cooked up to match your question more exactly.)