I gather for a series of four coin flips, if we get $H,H,H,H$, this has a probability of $\tfrac{1}{16}$, so we have information content $$\log_2 \frac{1}{\frac{1}{16}}$$
But for the rest of the events in the probability space, we have $$ \log_2\frac{1}{\frac{15}{16}}$$
which is much smaller. But why? Shouldn't a more probable event give us more information? I would think the occurrence of a more probable event tells us the system behaves as we expect and in this sense "informs" us.
My Question:
Why does the information content of a less probable event yield more bits than a more probable one?
It is quite the opposite. Imagine an event occurs with probability $1$. Then finding out that it has occured reveals nothing to you, you did not already know.
If on the other hand a very rare event occurs, receiving that information really tells you something unexpected thus revealing a lot of information you could not have deduced yourself.