Things I understand:
Shannon entropy-
- is the expected amount of information in an event from that distribution.
- In game of 20 questions to guess an item, it is the lower bound on the number questions one could ask.
Doubt:
It gives the lower bound on the number of bits needed on average to encode symbols drawn from a distribution.
I don't understand the why it mentions on average. Isn't it just a lower bound. Also, please elaborate on the lower bound if possible.
To see why there needs to be a notion of lower bound, consider that you need to specify a code. You could choose an arbitrarily inefficient code that takes a ton of bits to encode anything. So this is a statement about how well you can do.
To see why it must be an average, note that in general the length of the encoded sequence of symbols will depend on what symbols actually occur (since each symbol in general will have a different number of bits for its code word.) Since the symbols are randomly drawn, the length of the code will be random.