I recently encountered an intriguing question during a mathematical interview: How can one construct a fair binary random variable that outputs either $0$ or $1$ using a biased coin with a head-up probability of $ p $, while minimizing the expected number of coin tosses?
I am aware of the conventional method for simulating a fair coin from a biased one. This method involves the following steps:
- Toss the biased coin twice.
- If the outcomes are both heads (HH) or both tails (TT), disregard the results and start anew.
- If the outcomes are heads-tails (HT), output $0$.
- If the outcomes are tails-heads (TH), output $1$.
With this approach, the expected number of coin tosses $ n $ can be calculated as: \begin{align*} n &= 2 + (p^2 + (1 - p)^2) n \\ &\Rightarrow n = \frac{2}{2p(1 - p)} \end{align*}
However, the interviewer raised a more challenging aspect: devise and implement a strategy such that the expected number of coin tosses approximates the optimal $ \frac{1}{H(p)} $, where $ H(p) = -p \log_2(p) - (1 - p) \log_2(1 - p) $ represents the entropy?
I suspect that this is a systematically solvable problem. Are there any published works that delve into this issue?
Just do arithmetic coding.
Sketch: Starting from the $[0,1]$ real interval, assume that each coin ocurrence divides the interval into two parts of lengths $1-p,p$, recursively. So that, $T\equiv 0$ (prob $1-p=q$) corresponds to the interval $[0,q]$ and $H\equiv 1$ (prob $p$) corresponds to the interval $[q,1]$ (length $p$). This is done recursively, so that, eg, the sequence $THT\equiv 010$ is assigned to the interval $[q^2, q^2(1+p)]$ (length $q^2 p$). When the current interval falls inside a dyadic interval of length $\frac12^k$, we output the $k$ binary digits that correspond to that interval.
If we throw $n$ coins, the length of the interval will be $ p^{H} (1-p)^{T}$, so $$-k = H \log p + T \log(1-p) \approx np \log p + (1-p)\log(1-p)$$
hence, by throwing $n$ biased coins we produce (in average) $k$ fair coins, with
$$\frac{n}{k} \approx \frac{1}{H}$$
Edit: as pointed out in the comments, there are two caveats:
This approach requires the knowledge of $p$ (not required in the original method).
It also maps biased-coins to fair-coins in blocks (what in information theory is knowing as "coding the extension of the source", see eg).
These are inescapable if we want to achive the optimal result ($1/H$).
Edit 2: Knowing $p$ is probably not necessary (though it practically makes little difference). See for example https://www.jstor.org/stable/2240383