On the number of optimal prefix-free binary codes

313 Views Asked by At

Let $T$ be a text of length $L$ containing the symbols $$\mathcal{A}=\{a_1, a_2, \ldots, a_n\},$$ where each symbol appears at least once and no other symbol appears in $T$.

Define the weights $$\mathcal{W} = \Big\{w_1 = \frac{c_1}{L}, \ldots, w_n = \frac{c_n}{L}\Big\},$$ where $c_i$ is the number of occurrences of $a_i$ in $T$.

Given $\mathcal{A}$ and $\mathcal{W}$, one can run Huffman's algorithm to find an optimal prefix-free binary code $$\mathcal{E} =\{e_1, e_2, \ldots, e_n\},$$ where $e_i$ is the encoding for $a_i$.

Optimality of $\mathcal{E}$ means that the quantity $$\sum_{i=1}^{n}w_i l_i$$ is minimum among all other encodings, where $l_i$ is the bit-length of $e_i$.

Optimal prefix-free binary codes are not unique in general. My goal is to establish some bounds on the number of optimal prefix-free binary codes for a given pair $(\mathcal{A},\mathcal{W})$.

An optimal prefix-free binary code can be represented as a full binary tree. In particular, each leaf corresponds to a symbol, and every edge is labeled with either $0$ or $1$. The codeword $e_i$ is obtained by reading the path from the root to $a_i$.

The number of full binary trees with $n$ leaves is given by $$C_{n-1} = \frac{(2(n-1))!}{n!(n-1)!}.$$

Moreover, there are $n!$ ways of assigning symbols to leaves.

As a consequence, the number of optimal prefix-free binary codes for a pair $(\mathcal{A},\mathcal{W})$ is bounded from above by $$C_{n-1}n!=\frac{(2(n-1))!}{(n-1)!} \sim \sqrt{2}\Big(\frac{4(n-1)}{e}\Big)^{n-1}.$$

On the other hand, given a tree and a unique way of assigning symbols to leaves, there are $2^{n-1}$ ways of labeling the tree with $0$s and $1$s (for each internal node one can label the left child with $0$ or $1$, and do the opposite for the right child).

Therefore, the number of optimal prefix-free binary codes for a pair $(\mathcal{A},\mathcal{W})$ is bounded from below by $2^{n-1}$.

My questions are:

  1. Are the bounds correct?
  2. Are there better bounds?
  3. How can I use the fact that I know $\mathcal{W}$?
  4. Is there an efficient method for computing the number of optimal prefix-free binary codes for a pair $(\mathcal{A},\mathcal{W})$?