Shannon entropy of languages

163 Views Asked by At

In his paper Prediction and Entropy of Printed English Shannon defines the entropy $H$ of a language as

$$H = \lim_{N \to \infty} F_N$$

where $$F_N = \sum_{i, j} p(b_i, j) \log p(j | b_i)$$

where $b_i$ is a sequence and $j$ is a word.

The interpretation is: $F_N$ computes the surprise of the average continuation $j$ of a sequence $b_i$ (of length $N-1$) weighted by the probability of such a sequence $(b_i, j)$ occurring in the first place.

Shannon further states that

$$F_N = \sum_{i, j} p(b_i, j) \log p(j | b_i) = \sum_{i, j} p(b_i, j) \log p(b_i, j) + \sum_i p(b_i) \log p(b_i)$$

We arrive at that equivalence as follows:

\begin{align} \sum_{i, j} p(b_i, j) \log p(j | b_i) & = - \sum_{i, j} p(b_i, j) \log \frac{p(b_i, j)}{p(b_i)}\\ & = - \sum_{i, j} p(b_i, j) ( \log p(b_i, j) - \log p(b_i) ) \\ & = - \sum_{i, j} p(b_i, j) \log p(b_i, j) - p(b_i, j) \log p(b_i) \\ & = - \left(\sum_{i, j} p(b_i, j) \log p(b_i, j) - \sum_{i, j} p(b_i, j) \log p(b_i)\right) \\ & \text{marginal distribution in second sum}\\ & = - \sum_{i, j} p(b_i, j) \log p(b_i, j) + \sum_{i} p(b_i) \log p(b_i) \\ \end{align}

1

There are 1 best solutions below

2
On BEST ANSWER

My question was about the interpretation of the formula. I answered it myself while writing the question:

The interpretation is: $F_N$ computes the surprise of the average continuation $j$ of a sequence $b_i$ (of length $N-1$) weighted by the probability of such a sequence $(b_i, j)$ occurring in the first place.

But since references to this paper are sparse, and I only found a very badly scanned version that is hard to read, maybe this helps somebody.