Huffman coding - conditions for perfect tree output

940 Views Asked by At

The question is: Given 4 characters and their frequencies, what's the max possible difference between the frequency of the rarest character and that of the most common character, so the output Huffman tree is a perfect tree (all 4 leaves are the same depth)?

My direction: let's call the frequencies A, B, C ,D so that A is the frequency of the most common character and D is that of the rariest character. so that D < C < B < A.

So C and D will be merged first.

      C + D
     /     \
    C       D

Now in order to get a perfect tree we need A and B to be merged separately in the next step.

      A + B
     /     \
    A       B

(as opposed to B been merged with C+D resulting in a not perfect tree)

and then we get.

                       A + B + C + D
                       /          \
                    C + D        A + B  
                   /     \       /    \
                  C       D     A      B

But I can't formulate anything...

Thanks.

1

There are 1 best solutions below

5
On BEST ANSWER

Suppose we have the symbols $A,B,C,D$ ordered by probabilities: $P_A\ge P_B \ge P_C \ge P_D \ge0$.

The Huffman algorithm constructs the code by succesively merging the two symbols with lowest probabilities (reorder and repeat). In order to have a perfect binary tree (same depth for all leaves -in this case: to have all codes with length two) we need that the first merge produces a probability greater than $A$ (so it goes to the first position - otherwise, the code for $A$ would have only one bit) - this is necessary and sufficient. Hence we need

$$P_C + P_D \ge P_A$$

To bound $P_C$: $P_C+P_B+P_A\le 1$ and because $P_C$ is the minimum of those, then $P_C\le 1/3$. Hence $$ P_A - P_D \le \frac{1}{3}$$

It's easy to see that this bound is tight: just take $P_A= P_B=P_C=1/3$