Mapping randomness to binary

126 Views Asked by At

I'm working on an algorithm, to map entropy from random events like coin flips, dice, user input, timing delays, and so on (any arbitrary random event), to binary, in a deterministic and reversible manner without introducing any bias.

For example:

  • Using coin flips is easy, tails <=> 0 and heads <=> 1, because every coin flip, gives you 1 bit of entropy.
  • Using an 8-sided dice is also easy, 1 <=> 000 to 8 <=> 111, because every roll of an 8-sided dice, gives you exactly 3 bits of entropy.

These examples are 2^N. But what about events that do not fit nicely in 2^N. For example a regular 6-sided dice. Log2(6) ~= 2.5 bits.

So a 6-sided dice, is not enough entropy to get 3 bits, but is enough for 2 bits at least, (as we have ~2.5 bits of entropy). So how do we get at least 2 bits of entropy from a 6-sided dice?

Option 1 - The mapping could be:

1 <=> 00

2 <=> 01

3 <=> 10

4 <=> 11

5 <=> ?

6 <=> ?

If we rolled a 5 or 6, we could ask the user to roll again, (or in a general sense, for a computer program), we could error out, or discard, and collect new data for entropy. This option seems a bit wasteful because we are discarding the entropy we could of used, and we are re-rolling, or re-collecting more entropy.

Option 2, could be to truncate:

1 <=> 000

2 <=> 001

3 <=> 010

4 <=> 011

5 <=> 100

6 <=> 101

However this does not work, the least significant bit is ok, it has 50% chance of a 0 or 1. But the middle bit has a 4/6 chance of '0' and 2/6 chance of '1', and the last bit is also not 50/50. One explanation for this, is we are inflating the ~2.5 bits of entropy from a 6-sided dice to 3 bits of entropy, so we will be introducing some bias. We can only use 1 bit of entropy here (the least significant bit), however, if the source of entropy does not have an even probability, we may not even be able to get 1 bit.

Even if we ask the user to roll a 6-sided dice, 1 time (~2.5 bits), 2 (~5.2), 3 (~7.8), 4 (~10.3), 5 (~12.9), ... 10 times (~25.8 bits), we never get an whole number of bits, so do we get them to discard all of the dice rolls? Surely not. We should be able to collect 25 bits (or less?) of entropy from 10 6-sided dice rolls? If so, how?

1

There are 1 best solutions below

0
On

Please check Shannon-Fano Encoding, also Huffman coding. The notion of entropy is inherently asymptotic; if you adopt the optimum encoding to represent $X$, for long enough sequence of $\{X_i\}_{i=1}^{n}$ (as $n\rightarrow \infty$), the length of representation tends to $nH(X)$.

In practice, additional concerns must be taken into account. Trivially, any binary sequence must be uniquely decoded. In the case of a fair dice, you may adopt the following encoding:

$1: 00, 2: 01, 3: 100, 4: 101, 5: 110, 6: 111$

which results in an average length of 2.66 bits for a long enough sequence. The overhead of $2.66-2.58=0.08$ bits is to ensure unique decoding. For example, if you want to decode:

$001110100010001101...$

it can only be decoded as:

$00, 111, 01, 00, 01, 00, 01, 101, ...$

that is: $1, 6, 2, 1, 2, 1, 2, 4, ...$