Upper bound on Huffman codeword length

1.5k Views Asked by At

I am reading Elements of Information Theory by Cover and Thomas and have been unable to find an upper bound on the length of a codeword in a Huffman code, either in this book or on the web. Does one exist? If so, could someone provide an outline of a proof and an example achieving this bound? Please assume that we don't know the number of symbols to be encoded in advance, only the probability of a given codeword.

1

There are 1 best solutions below

9
On BEST ANSWER

The length of a codeword is simply the length of the path from the root of the tree in the construction of the Huffman code to the leaf corresponding to the symbol that it codes for. If you know the total number of symbols to be coded for, then you know the number of leaves, so what is the maximum path length possible?

After edit

If you only have the probability of a codeword, then you need to have a finer grasp of the Huffman tree. You should of course first understand the proof of the Huffman code optimality, since the core property is the key to most of its other properties. Recall that in the tree the weight of a leaf is just the probability of the symbol and the weight of an internal node is just the sum of the children's weights.

The core property is that in any optimal tree, for any two nodes $a,b$, if $a$ is heavier than $b$ then $a$ is not below $b$, otherwise swapping the subtrees will give a better tree. This is key to proving that any optimal tree can be transformed into another optimal tree that agrees with the code generated by the Huffman algorithm.

This same core property is key for this question too. We want for any node an upper bound on its depth given a lower bound on its weight. Equivalently, we want an upper bound on its weight given a lower bound on its depth. But if we go from the root to the lightest child at each step, its weight must decrease by a factor of at least $2$ at each step. Thus the lightest node at depth $k$ has weight at most $2^{-k}$. But any node at depth $k+1$ is at least as light as the lightest node at depth $k$ by the core property, and hence we have an upper bound on its weight. From this it is easy to get an upper bound on the depth given the weight.