Number of bits from entropy calculations

189 Views Asked by At

To calculate the entropy for a an event of a uniform distribution of this kind

  • Events: "red" or "blue"
  • Probabilities: P(red) = 0.5, P(blue) = 0.5

We just do $e = -\log_2(0.5)$

In this case it yields 1. Which I guess it means we need one bit.

If we have 4 values of 0.25, we need 2 bits. No surprises here.

However, lets say we have this non uniform distribution:

  • Events: "red" or "blue"
  • Probabilities: P(red) = 0.125, P(blue) = 0.875

Now, "red" should be represented with 3 bits, is this correct ? Why don't we still use just 1 bit?

1

There are 1 best solutions below

4
On BEST ANSWER

"red" should be represented with 3 bits, is this correct ?

No. You could say that "red" gives 3 bits of information; and you could also say that the "ideal" code should assign 3 bits to "red" and 0.19 bits to blue - so that in average you use 0.54 bits, which is the entropy. But of course, you can't do that. The optimal code is still 1 bit for each symbol.

To attain the entropy (and use few bits for blue -or blueish- symbols) you need to code the extensions of the source. See eg here.