How to calculate the probabilty of symbol in huffman code?

141 Views Asked by At

I have a question that I tried to solve but I always get stuck..

The following Huffman code for an alphabet consisting of five symbols A to E

A - 00
B - 01
C - 10
D - 110 
E - 111

Given that the probabilities for A to E are descending that is : P(A) > P (B) > P (C) > P (D) > P(E) What are the maximal and minimal values for P(B) and why?

I tried to use the Kraft inequality for the length code words and than to calculate the average code word of the given encoding but I don't see how this would help to solve the question..

1

There are 1 best solutions below

1
On BEST ANSWER

If $A$ uses more than one bit, there can be at most three symbols using two bits, and this just happens to be the given encoding; so any other code with at least two bits for $A$ is trivially worse than the given one. Hence we should check if an encoding where $A$ uses only one bit could be better than the given one.

Then if $B$ uses two bits, the three others must all use at least three bits, that is a pixed two bits prefix to distinguish from $A$ and $B$ and then some; but Huffman for three symbols always results in $0, 10, 11$, so we ask: When would

A - 0
B - 10
C - 110
D - 1110
E - 1111

be better?

Alternatively, we there might be a code that assigns one bit to $A$ and three bits to $B$. In fact, assigning three bits to all (except $A$) gives us a valid code, so we ask: When would

A - 0
B - 100
C - 101
D - 110
E - 111

be better?

For the first, we find as necessary condition $$ P(A)+2P(B)+3P(C)+4P(D)+4P(E)\ge 2P(A)+2P(B)+2P(C)+3P(D)+3P(E)$$ i.e. $$\tag1 P(C)+P(D)+P(E)\ge P(A).$$ For the second, we find $$ P(A)+3P(B)+3P(C)+3P(D)+3P(E)\ge 2P(A)+2P(B)+2P(C)+3P(D)+3P(E)$$ i.e. $$\tag2 P(B)+P(C)\ge P(A).$$ From $(1)$ we see $$1=P(A)+\ldots+P(C) \ge P(A)+P(B)+P(A)> 3P(B)$$ so $P(B)<\frac13$. From $(2)$ we see that $$1=P(A)+\ldots+P(D)\le 2P(B)+2P(C)+P(D)+P(E)<6P(B) $$ so $P(B)>\frac16$. And indeed, probabilities $(\frac13,\frac13-\epsilon,\frac19+2\epsilon,\frac19,\frac19-\epsilon)$ and $(\frac13-2\epsilon,\frac16+2\epsilon,\frac16+\epsilon,\frac16,\frac16-\epsilon)$ lead to the encoding of the problem statement.