Entropy and probabilistic Algorithms

118 Views Asked by At

Recall entropy, from basic information theory:

The entropy of a probability distribution $D$ on a finite set $X$ is $$H(D)=\sum_{x\in X}{p(x) \cdot \log_2{\!(1/p(x))}}$$

I was able to prove that the maximum entropy of any distribution over $[n]$ is $\log{\!(n)}$ and it is achieved by the uniform distribution. Also, I was able to prove that when we group 2 parameters from $D$—​let's say, $D=\{p_1,\dots,p_n\}$ and $D'=\{p_1,\dots,p_{n-2},p_{n-1} + p_n\}$—​then $H(D')\leq H(D)$.

I need to show using the above that if I have a biased coin with probabilities $p$ and $1 − p$ of each outcome, that in order to obtain a length-$k$ sequence of unbiased coin-flips, then you need on average to use $k/H(p, 1 − p)$ tosses of your biased coin.

My calculations are as follows:- for the biased coin, in order to make an unbiased coin from it then I flip the coin twice, if it lands on different faces then I take the first face as the answer , otherwise, I have the same result in the 2 tosses then I toss again. the expectation of this is $1/(2p(1-p))$ so in order to get $k$ tosses the expected value would be $k/(2p(1-p))$ and the entropy $$H(p, 1 − p) = p\cdot\log((1-p)/p) + \log(1/(1-p))$$ I can't make a connection between the entropy and my answer.

2

There are 2 best solutions below

1
On BEST ANSWER

Let $X$ be source corresponding to the biased coin, and $Y$ the desired fair coin.

Your procedure needs $Z$ attempts, where $Z$ follows a geometrical distribution with success probability $ 2p(1-p)$ , so $E[Z]=1/(2p(1-p))$. And each attempt needs two instances of $X$. Hence, in average, to produce $k$ instances of $Y$ you need $$ \frac{k}{p(1-p)}$$ tosses of the biased coin.

You can see that your proposal is correct but highly inefficient, by looking at the special case $p=1/2$: you need on average $4k$ tosses, when actually $k$ are enough.

Imagine instead that you take the sequence of $X$ and use it as input to an ideal binary code (that it, $Y$ is the perfect binary compression of $X$). Then, you know , by the first Shannon theorem and its corolaries, that:

  • If $n$ input symbols result in $k$ coded symbols, then $k/n \to H(X)$
  • The coded symbols ($Y$) have maximal entropy, i.e., they correspond to a fair coin.

In conclusion, you need $n = k/H(X)=k/h(p)$ tosses of the biased coin.

The graph displays the ideal $n/k=1/h(p)$ (in red) vs the ratio that produces your algorithm.

enter image description here

You can devise a realizable implementation of this idea by arithmetic coding.

0
On

Let me provide some hints. Assume we flip the biased coin for $n$ times. Assume $p$ represents the probability that the coin flip is 1. After n coin flips, define the "likely" sequences of coin flips as a subset of $\{0,1\}^n$ as the sequences with approximately $np$ number of ones. The number of this sequences are given by $\frac{1}{n+1}2^{nH(p)}\leq{n \choose np}\leq 2^{nH(p)}$ (see Example 11.1.3) in Cover's book. Also, note that all of these likely sequences are equiprobable.

Then, the sampling mechanism is as follows. Consider the sequences of length $k$ of the "unbiased" coin flip in $\{0,1\}^k$. Then, similar to your strategy, consider a mapping between ${n \choose np}$ "likely" sequences of biased coin flips to $2^k$ sequences of unbiased coin flips. This shows that $k \approx nH(p)$.

A valid question is what if after $n$ biased coin flips, we do not get an likely sequence (It is similar to what you have in your solution). However, note that as $n$ becomes large the probability that after sampling $n$ biased coin you do not get a likely sequence "vanishes".