Interpretation of shanon entropy

239 Views Asked by At

I've often seen as an explanation for shanon entropy that it represents "the average number of yes/no questions needed to find out which state a given system is in."

This works very well for a system having an even number of possible states, but if I try to do this with let's say, 3 possible states all equiprobable I don't find this to be true. Indeed, using the entropy, $ H = \sum_{n = 1}^{3}\frac{1}{3}\log_2(\frac{1}{3}) = 1.58$ I find an average number of 1.58 yes/no questions. If I encode the $1^{st}$ state as 0, the $2^{nd}$ state as 1 and the $3^{rd}$ as 00, the average number of bits (questions) that I need to characterize the system is $\frac{4}{3}=1.33$.

Did I make a mistake somewhere or does it mean that this explanation with the yes/no questions is just a way to simplify things ?

2

There are 2 best solutions below

0
On BEST ANSWER

First, your proposed encoding $A\to 0$, $B \to 1$, $C\to 00$ attains a coding length below the entropy, which should be impossible.

The problem is that your encoding is practically useless, because it's not "uniquely decodable": if you receive $00$ you cannot know if the input was $AA$ or $C$.

Then you need to do something like $A\to 0$, $B \to 10$, $C\to 11$ , which has an average coding length of $5/3=1.666$ , above the entropy ($H=1.585$). This is to be expected.

And if you code each value isolated, then, yes, you cannot perform better than that . But you can code several values together (code "the extension of the source"), and approach the entropy.

Consider for example a group of $n=5$ values. There are $3^5=243$ equally probable joint values. Because $243 < 256 = 2^8$, you can code this with group with $8$ yes-no questions.

Hence, with this simple scheme, you'd need to ask $8$ questions to discover $5$ values, which gives a coding length $8/5=1.6$, near the entropy.

The above is not optimal (see Huffman coding) but you get the idea.

2
On

The expected code length of a Shannon-Fano code is

$ \mathbb {E} L=\sum _{i=1}^{n}p_{i}l_{i}\leq \sum _{i=1}^{n}p_{i}(-\log _{2}p_{i}+1)=-\sum _{i=1}^{n}p_{i}\log _{2}p_{i}+\sum _{i=1}^{n}p_{i}=H(X)+1.$

Note that the bound has a "+1" -- this is in agreement with your computations

http://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding

Note that Shannon-Fano codes are not optimal, but Huffman codes are. For Huffman codes, see

https://en.wikipedia.org/wiki/Huffman_coding