How to think about self-information?

67 Views Asked by At

Recently, I've been struggling a bit with the concept of self-information. In particular, I don't know how I'm supposed to interpret an event "having more information" than another.

For instance, if I have a weighed coin such that $P(tails) = 0.9$ and $P(heads) = 0.1$, then

$$ I(tails) = \log_2\left(\frac{1}{0.9}\right) \approx 0.152 < 3.3219 \approx \log_2\left(\frac{1}{0.1}\right) = I(heads) $$

So flipping the coin and getting heads "gives more information" than getting tails. However, in what sense is this the case? Like, what does getting heads give me more information about? Is it just a way to say that an event is "more surprising" than another one?

1

There are 1 best solutions below

0
On BEST ANSWER

I learned "self-information" as "the surprise function", so this is definitely a way of quantifying that an event is more surprising. It ($s(A)=-\log_2\mathbb{P}(A)$) is the unique function that satisfies:

  • $s(A)$ depends continuously on $A\mapsto \mathbb{P}(A)$
  • $s(A)$ is decreasing in $A\mapsto \mathbb{P}(A)$
  • $s(A\cap B)=s(A)+s(B)$ for independent $A,B$
  • $s(A)=1$ if $\mathbb{P}(A)=0.5$
  • $s(A)=2$ if $\mathbb{P}(A)=0.25$

Thinking about why each of these conditions are required and desirable (bar the last two, which fix the base and coefficient) gives insight into self-information.

To directly answer your question, though, there is a sense in which getting heads tells you more information than tails. Consider a crime scene where a sample of blood is tested and found to be a blood type that $10\%$ of the population have. This gives you more information than if the blood matched $90\%$ of the population. Information theory often works well through this "crime scene" lens of "narrowing down suspects".