implement a fair 0-1 random variable using a biased coin with head p, while minimizing the expected number of coin tosses to $1/H(p)$

214 Views Asked by At

I recently encountered an intriguing question during a mathematical interview: How can one construct a fair binary random variable that outputs either $0$ or $1$ using a biased coin with a head-up probability of $ p $, while minimizing the expected number of coin tosses?

I am aware of the conventional method for simulating a fair coin from a biased one. This method involves the following steps:

  1. Toss the biased coin twice.
  2. If the outcomes are both heads (HH) or both tails (TT), disregard the results and start anew.
  3. If the outcomes are heads-tails (HT), output $0$.
  4. If the outcomes are tails-heads (TH), output $1$.

With this approach, the expected number of coin tosses $ n $ can be calculated as: \begin{align*} n &= 2 + (p^2 + (1 - p)^2) n \\ &\Rightarrow n = \frac{2}{2p(1 - p)} \end{align*}

However, the interviewer raised a more challenging aspect: devise and implement a strategy such that the expected number of coin tosses approximates the optimal $ \frac{1}{H(p)} $, where $ H(p) = -p \log_2(p) - (1 - p) \log_2(1 - p) $ represents the entropy?

I suspect that this is a systematically solvable problem. Are there any published works that delve into this issue?

2

There are 2 best solutions below

4
On

Just do arithmetic coding.

Sketch: Starting from the $[0,1]$ real interval, assume that each coin ocurrence divides the interval into two parts of lengths $1-p,p$, recursively. So that, $T\equiv 0$ (prob $1-p=q$) corresponds to the interval $[0,q]$ and $H\equiv 1$ (prob $p$) corresponds to the interval $[q,1]$ (length $p$). This is done recursively, so that, eg, the sequence $THT\equiv 010$ is assigned to the interval $[q^2, q^2(1+p)]$ (length $q^2 p$). When the current interval falls inside a dyadic interval of length $\frac12^k$, we output the $k$ binary digits that correspond to that interval.

If we throw $n$ coins, the length of the interval will be $ p^{H} (1-p)^{T}$, so $$-k = H \log p + T \log(1-p) \approx np \log p + (1-p)\log(1-p)$$

hence, by throwing $n$ biased coins we produce (in average) $k$ fair coins, with

$$\frac{n}{k} \approx \frac{1}{H}$$


Edit: as pointed out in the comments, there are two caveats:

  • This approach requires the knowledge of $p$ (not required in the original method).

  • It also maps biased-coins to fair-coins in blocks (what in information theory is knowing as "coding the extension of the source", see eg).

These are inescapable if we want to achive the optimal result ($1/H$).


Edit 2: Knowing $p$ is probably not necessary (though it practically makes little difference). See for example https://www.jstor.org/stable/2240383

0
On

To give a counter-example that, when we want to simulate a single coin flip, we can not always reach the information-theoretic lower bound:

Let $\{X_i\mid i\in\mathbb N\}$ be a set of i.i.d. random variables mapping to $\{0,1\}$ with $\mathbb P(X_i=1)=p$. Identify $w_1....w_n=:w\in\{0,1\}^*$ with the event $\{\bigwedge_{i=1}^n X_i=w_i\}$ and let $|w_1...w_n|:=n$.

Then we're looking for a prefix free code $W\subseteq \{0,1\}^*$ and $Y:W\to\{0,1\}$ such that $\mathbb P(Y^{-1}(0))=\mathbb P(Y^{-1}(1))=0.5$, and which minimizes $E(W):=\sum_{w\in W}|w| \mathbb P(w)$.

Given $W':= \{w\in W\mid |w|\le k\}$ for some $k$, we can get a lower bound on $E(W)$ by $$ E(W)=\sum_{i=1}^\infty \sum_{w\in W\\ |w|=i} |w| \mathbb P(w) = E(W') + \sum_{i=k+1}^\infty \sum_{w\in W\\ |w|=i} |w| \mathbb P(w) \ge E(W') + (k+1)\cdot \sum_{i=k+1}^\infty \sum_{w\in W\\ |w|=i} \mathbb P(w) = E(W') + (k+1)(1-\mathbb P(W')) $$

For $p_n<0.5$ for $n\in\mathbb N$ and with $p_n\to 0.5$ further only the length of the words in $W$ matters. Because $p_n<0.5$, we can have at most one word of length 1 in $W$. Therefore we can make a case distinction:

Case 1: $|\{w\in W\mid |w|=1\}|=0$
In this case, $E(W)\ge 2\cdot 1 = 2$

Case 2: $|\{w\in W\mid |w|=1\}|=1$
In this case, $E(W)\ge 1\cdot 0.5 + 2\cdot 0.5 = 1.5$

So, no matter our code $W$, we'll always have $E(W)\ge 1.5> \frac1{H(0.5)}=1$