Say, I have a list of integers x = [102, 102, 102, 1, 0, -1, -2000, -3, 3, 102, 1, 0, -1, -1, -2000, -2000, -2000, -2000, -1, -1, 0, 0, 1, 3] and the frequencies for each number is:
Integer Count
102 4
3 2
1 3
0 4
-1 5
-3 1
-2000 5
If I generate a Huffman tree and then encode this sequence, I find that the total number of bits necessary for encoding this list is 65 bits. In theory, the order of the integers shouldn't matter (i.e., I can shuffle this list of integers) as it should still produce the same number of bits (65).
Considering that I only want to compute the number of bits and I don't care about the actual encoding or the tree, is there a faster/better way to compute the number of bits used (i.e., 65) without having to explicitly construct the Huffman tree? Perhaps, it may be possible to generate the bit count based on the frequency count alone?
I believe that you are looking for the binary entropy -- the average number of bits required to encode your symbols. To get that, you divide the entropy in nats by log(2.0). I get an answer of 2.66937 bits per symbol. Multiply that by 24 symbols, and you get an answer of 64.0649 bits. Take the ceiling of that.