Does Shannon encoding represent a lower bound on the number of bits needed?

998 Views Asked by At

In textbook discussions it is common to characterize Shannon entropy as giving (emphasis added)

a lower bound on the number of bits ... needed on average to encode symbols drawn from a distribution

I understand that this is intended to mean, more precisely, that it is average number of bits that will be used by the most efficient (in terms of number of bits) encoding, to encode the symbols in that distribution (or, equivalently, the expected average number of bits for symbols drawn for it).

But is true that the Shanon entropy is a lower bound?

For example, if I have 5 symbols with frequencies

$$\frac{1}{2},\frac{3}{16},\frac{1}{8},\frac{1}{8},\frac{1}{16}$$

the Shanon entropy is $1.95$ (by using 1, 2.4, 3, 3, and 4 bits respectively), but I can encode these symbols as

$$0, 1, 00, 10, 11$$

for an entropy of $1.31$; or (if for some reason the values must be distinct) as

$$0,1,10,11,100$$

for an entropy of $1.38$.

Is there some requirement that's placed on the encoding assumed in defining entropy that I'm violating in my examples? Is it true that Shannon encoding represents a lower bound on the number of bits needed?

2

There are 2 best solutions below

9
On

The Shannon entropy represents a lower bound on the average number of bits needed to represent the information symbols without losing any information. In other words, the code should be uniquely decodable. The examples you gave do not constitute a uniquely-decodable code. For example, how do you decode 010? Is it 0, 1, 0 or 0, 10?

An example of a uniquely-decodable code is 0, 10, 110, 1110, and 1111. With this code, only one sequence represent an encoded sequence. For example, the encoded sequence 01110101111110 correspond to adbec, where a, b, c, d and e represent the uncoded symbols.

7
On

You are comparing different things. What your "code" achieves is to minimize the quantity $\sum_i p_i l_i$, where $p_i$ and $l_i$ stand for the probability and description length of the source symbol $i$. However, the resulting code is not useful for transmission of information since, as correctly noted by @Math Lover, the generated bit sequences are not uniquely decodable. You should also supply information on what you refer to as "delimiters". Clearly, the delimiter information will require its own "bits" to send, therefore, increase the overall code efficiency.

What we want is to design codes that are uniquely decodable and, also, are instantaneous, i.e., one can decode all symbols up to a certain bit in the sequence without having to look at the entire string of bits. Under this condition, the optimal source code again minimizes $\sum_i p_i l_i$ (as you did), however, under the constraint ("Kraft inequality") $\sum_i 2^{-l_i} \leq 1$. This constraint can be shown to be satisfied by every instantaneous code. It is for this constrained optimization problem formulation that the Shannon entropy provides a lower bound on the objective.