A few days ago a group of 8 of us had to split into two teams of 4 for the purposes of competing in a trivia night. Notwithstanding that it was far from optimal as a trivia-winning strategy, we decided to split into teams randomly, but we only had a coin as a practical source of randomness.
Thus the question that arose was this:
Given a set $\{1, \dotsc, 2N\}$ and a sequence $X_1, X_2, \dotsc$ of fair coin flips, what is a practical way of sampling a a subset with exactly $N$ elements uniformly at random?
What we implemented on the night was incorrect, but thinking about it later, what we were trying to do was the following sequential procedure in each step of which we were assigned into one of group $A$, group $B$ or unassigned $U$.
Formally, we began with $(A_0, B_0, U_0) = (\varnothing, \varnothing, \{1, \dotsc, 2N\})$. Then, we would iterate the following over $k$:
- Everyone in $U_k$ would flip a coin, and assign $U_k^H = \{i \in U_k : i \text{ flipped } H\}, U_k^T = \{i \in U_k : i \text{ flipped } T\}$.
- Set $U_k^* = \mathop{\mathrm{argmin}}_{U \in \{U_k^H, U_k^T\}} \bigl||U \cup A_k| - N\bigr|$.
- Reassign the groups according to $$ (A_{k+1}, B_{k+1}, U_{k+1}) = \begin{cases} (A_k \cup U_k^*, B_k, U_k \setminus U_k^*) & \text{if } |U_k^* \cup A_k| \leq N, \\ (A_k, B_k \cup (U_k \setminus U_k^*), U_k^*) & \text{if } |U_k^* \cup A_k| > N. \end{cases} $$
- Stop when $U_k = \varnothing$.
My questions are then:
- Does this actually sample a subset of size $N$ uniformly at random? My guess is yes, since it seems to me that if the set were permuted before the process began, then no distributions would change, but this is hardly a formal argument.
- Is this optimal in terms of a reasonable cost function (like total number of coins flipped or total number of iterations)? My guess is no, but I can't think of anything better.
- Is there a way of doing this that is more practical to physically implement?
I propose this solution (that I think simpler and may require less number of coins flipped)
Suppose $K$ is the smallest number satisfying: $2^K>2N$.
Each person $P_i$ for $i=1,...,2N$ flips the coin exactly $K$ times and the result is represented by $K$ random Bernoulli variables $(a_{i,0},a_{i,1},...,a_{i,K-1})$ with $a_{i,k} = 0$ if $H$ and $a_{i,k} = 1$ if $T$.
Define the random function $$f(P_i) = \sum_{k=0}^{K-1}a_{i,k}\cdot2^{k} \qquad \forall i=1,...,2N$$ By definition, the function $f$ receive value from $0$ to $2^{K}>N$. There are chances that some $f(P_i)$ can receive same value. If this scenario occurs, we start again the procedure, until all $(f(P_i))_{i=1,..,2N}$ have different values. The procedure will stop almost surely because the probability that all $(f(P_i))_{i=1,..,2N}$ have different values is strictly positive (at least greater than $\frac{1}{2^{2N}})$).
The obtained $2N$ random functions $(f(P_i))_{i=1,..,2N}$ are i.i.d.
It suffices now to order the values of $2N$ random functions$(f(P_i))_{i=1,..,2N}$ and choose the first $N$ smallest (or biggest) values among them and we can split the group into two equal teams using only a coin (the coin doesn't need to be fair or not).