Shannon's Theorem 4 on Entropy of an Information Source

59 Views Asked by At

In Shannon's paper "A Mathematical Theory of Communication," section I.7 "The Entropy of an Information Source," he discusses some results regarding the entropy $H$.

In theorem 3 he states the following

Theorem 3: Given any$\epsilon$ > 0 and$\delta$ > 0, we can find an $N_0$ such that the sequences of any length $N \ge N_0$ fall into two classes:

  1. A set whose total probability is less than $\epsilon$.
  2. The remainder, all of whose members have probabilities satisfying the inequality $$\big| \frac{\log p^{-1}}{N} - H \big| < \delta$$

Then he introduces theorem 4 as below

A closely related result deals with the number of sequences of various probabilities. Consider again the sequences of length $N$ and let them be arranged in order of decreasing probability. We define $n(q)$ to be the number we must take from this set starting with the most probable one in order to accumulate a total probability $q$ for those taken.

Theorem 4: $$\lim_{N \to \infty} \frac{\log n(q)}{N} = H$$ when $q$ does not equal $0$ or $1$.

I don't understand the definition of $n(q)$ here and what he meant by accumulating a total probability $q$, did he mean adding up the probabilities of the sequences starting with the most probable one to the least until we reach $q$? Or did he mean multiplying them to get the probability of the intersection of these sequences (assuming independence)?

Also, what does the function $n(q)$ tell us about the information source? And what does this theorem imply?

I'd also appreciate a proof of theorem 4 if possible.