The question is: Given 4 characters and their frequencies, what's the max possible difference between the frequency of the rarest character and that of the most common character, so the output Huffman tree is a perfect tree (all 4 leaves are the same depth)?
My direction: let's call the frequencies A, B, C ,D so that A is the frequency of the most common character and D is that of the rariest character. so that D < C < B < A.
So C and D will be merged first.
C + D
/ \
C D
Now in order to get a perfect tree we need A and B to be merged separately in the next step.
A + B
/ \
A B
(as opposed to B been merged with C+D resulting in a not perfect tree)
and then we get.
A + B + C + D
/ \
C + D A + B
/ \ / \
C D A B
But I can't formulate anything...
Thanks.
Suppose we have the symbols $A,B,C,D$ ordered by probabilities: $P_A\ge P_B \ge P_C \ge P_D \ge0$.
The Huffman algorithm constructs the code by succesively merging the two symbols with lowest probabilities (reorder and repeat). In order to have a perfect binary tree (same depth for all leaves -in this case: to have all codes with length two) we need that the first merge produces a probability greater than $A$ (so it goes to the first position - otherwise, the code for $A$ would have only one bit) - this is necessary and sufficient. Hence we need
$$P_C + P_D \ge P_A$$
To bound $P_C$: $P_C+P_B+P_A\le 1$ and because $P_C$ is the minimum of those, then $P_C\le 1/3$. Hence $$ P_A - P_D \le \frac{1}{3}$$
It's easy to see that this bound is tight: just take $P_A= P_B=P_C=1/3$