Recently, I've been struggling a bit with the concept of self-information. In particular, I don't know how I'm supposed to interpret an event "having more information" than another.
For instance, if I have a weighed coin such that $P(tails) = 0.9$ and $P(heads) = 0.1$, then
$$ I(tails) = \log_2\left(\frac{1}{0.9}\right) \approx 0.152 < 3.3219 \approx \log_2\left(\frac{1}{0.1}\right) = I(heads) $$
So flipping the coin and getting heads "gives more information" than getting tails. However, in what sense is this the case? Like, what does getting heads give me more information about? Is it just a way to say that an event is "more surprising" than another one?
I learned "self-information" as "the surprise function", so this is definitely a way of quantifying that an event is more surprising. It ($s(A)=-\log_2\mathbb{P}(A)$) is the unique function that satisfies:
Thinking about why each of these conditions are required and desirable (bar the last two, which fix the base and coefficient) gives insight into self-information.
To directly answer your question, though, there is a sense in which getting heads tells you more information than tails. Consider a crime scene where a sample of blood is tested and found to be a blood type that $10\%$ of the population have. This gives you more information than if the blood matched $90\%$ of the population. Information theory often works well through this "crime scene" lens of "narrowing down suspects".