Entropy of a race

224 Views Asked by At

Having an old exercise book without a solution to backward-engineer it. I would like the concept to be explained.

Having a race between 17 people, the 1st person has 3/4 posibility of winning and all (16) others have 1/64 posibility each.

If after the race we get a message that the winner was not the 1st, what is the information that message contains, and how much uncertainty is left about the winner of the race?

3

There are 3 best solutions below

0
On

The event that correspond to having somebody else then the first to win have probability $1/4$, the amount of information is usually measured with the $-\log_2$ function (at least in information theory and for entropy and such information measures), hence the information contained in the message is $-\log_2(1/4) = 2$ bits.

The uncertainty left about the race is the expected amount of information that remains. When we know the first runner isn't the winner then the winner is uniformly distributed over the rest of the $16$ other runners by symmetry hence the entropy is $-16\cdot\frac{1}{16}\log_2\left( \frac{1}{16} \right)=4$ bits.

2
On

Let $X$ be a random variable representing the winner, with possible values $\{x_1, x_2, \ldots, x_{17}\}$. Then its pmf is: $$ P(X) = \begin{cases} 3/4 & \text{if } X = x_1 \\ 1/64 & \text{otherwise} \end{cases} $$

The Shannon entropy (i.e.: uncertainty) before the race is: $$ H(X) = -\sum_{i=1}^{17} P(x_i) \log_2 P(x_i) = -\frac{3}{4}\log_2 \frac{3}{4} - 16\left( \frac{1}{64}\log_2 \frac{1}{64} \right) = 1.8112\ldots $$

With the message, the new pmf looks like this: $$ P(X \mid X \neq x_1) = \begin{cases} 0 & \text{if } X = x_1 \\ 1/16 & \text{otherwise} \end{cases} $$

So after receiving the message, the Shannon entropy (i.e.: uncertainty) becomes: \begin{align*} H(X \mid X \neq x_1) &= -\sum_{i=1}^{17} P(X = x_i \mid X \neq x_1) \log_2 P(X = x_i \mid X \neq x_1) \\ &= 0 - 16\left( \frac{1}{16}\log_2 \frac{1}{16} \right) \\ &= 4 \end{align*}

So the information gained was: $$ H(X) - H(X \mid X \neq x_1) = 1.8112 \ldots - 4 = -2.1887\ldots $$


Remark: It should make sense that the message provided negative information, since the result of the race suddenly became more uncertain. The 1st person used to be a sure bet - now anyone can win. In general, uniform distributions have more uncertainty than heavily skewed distributions.

0
On

The two existing answers contradict each other.

P. Quinton's answer is correct: The message carried $2$ bits of information, and the remaining uncertainty is $4$ bits. The entropy calculations in Adriano's answer are also correct. However, the conclusion that the information “gained” is the difference of these entropies, and thus negative, is incorrect. The information gained is determined only by the probability of the message received, not by the rest of the distribution. The entropy of the distribution gives the expected information gain upon receiving a message distributed accordingly. The fact that the conditional distribution after receiving the message has a higher entropy, and thus more uncertainty, than the prior one, does not contradict the fact that a positive amout of information was received. It's just that before receiving the message, we expected the race result to contain little information, whereas now that we've received the message, we expect it to contain much information. We can see in more detail how this fits together by adding the information from the two steps in which we're receiving the race result, and comparing this to the information that would have been transferred if we'd received the race result in a single step.

So the message in the first step, that the winner isn't the first person, carried $2$ bits of information. If in the second step we learn who the winner is, this will now carry an additional $4$ bits of information, since no matter who it is, the probability for this will be $\frac1{16}=2^{-4}$.

Had we learned in a single step who the winner is, this message would have carried $6$ bits of information, since the probability for this person to win was $\frac1{64}=2^{-6}$.

Thus, whether we receive the information in two steps or one, it's always $6$ bits of information in total.

We can prove this more generally: Say we learn that event $A$ occurred, and then later that event $B\subseteq A$ occurred. The first message contains $-\log_2P(A)$ bits of information. The second message contains

$$ -\log_2P(B\mid A)=-\log_2\frac{P(B\cap A)}{P(A)}=-\log_2\frac{P(B)}{P(A)}=\log_2P(A)-\log_2P(B) $$

bits of information. Had we learned in a single step that $B$ occurred, the message would have contained $-\log_2P(B)$ bits of information, the sum of the amounts of information received in the two steps.

This also works if $B\not\subseteq A$. In this case after receiving the second message that $B$ occurred, we in fact know that $B\cap A$ occurred. The first step again transmits $-\log_2P(A)$ bits, the second step now transmits $\log_2P(A)-\log_2P(B\cap A)$ bits, and the sum is again the information we would have gained upon learning in a single step that $B\cap A$ occurred, namely $-\log_2P(B\cap A)$.