I've been having trouble with a question regarding the Huffman algorithm under a uniform probability distribution.
Suppose we have $k$ symbols $\{x_1, x_2,\dots, x_k\}$, and $c_i$ are the respective codewords associated under the binary Huffman algorithm. Suppose that the probability of the symbols $x_i$ is uniform, i.e., $\mathbb{P}(x_i)=\frac{1}{k}$.
- Determine the size of the longest codeword in terms of $k$ and the values of $k$ for which all codewords have the same length $k$.
After some trial and error, I believe the answers are respectively $\lceil \log_2(k) \rceil$ and powers of $2$ (for the second one, I can see why this is true: since the probabilities are uniform, the associated Huffman tree will always have nodes a power of 2 at each depth of the tree, but I can't write a proper reason as to why other values wouldn't work).
- Letting $L$ be the size of the longest codeword, show that all codewords have either size $L$ or $L-1$.
I can see why this should be true: the Huffman algorithm encodes the symbols with the highest probability to codewords with smaller size, so if the probability for all symbols is the same, the size of the codewords shouldn't vary a lot. But I can't write a formal proof.