How to generate eaually probable random integer from another random integer distribution?

53 Views Asked by At

Let $X_n, n\in \Bbb N$ be the random integer distribution with $$P(X_n=i)=1/n\cdot I(i\in[0, n)\cap \Bbb Z).$$

Question: in general, how to generate $X_n$ from $X_m$ when $m\ne n$?

Example: how to get $X_7$ out of $X_5$? Apparently one sample isn't sufficient because the state space has 5 states which cannot be divided into 7 equally sized subsets. My first thought would be to generate a state space whose size is a multiple of 7 via multiple iid samples of $X_5$. But it seems $5^k$ is never divisible by $7$. So even worse, we can never get it using only finitely many samples. We must design a kind of "hit some condition then stop" experiment to generate an infinite state space. So here's my shot: run $X_5$ twice and record the ordered pair $(i,j)$. That makes 25 outcomes. Impose an ordering on these 25 pairs. Now, if the outcome belongs in the first 4 pairs, discard them and the procedure starts over. If not, then note that the remaining 21 pairs can be divided into 7 equally probable subsets.

Is there any other perhaps more efficient way to do this?

2

There are 2 best solutions below

2
On

Suppose, as a result of previously generated numbers, you are in one of $k$ equally-probable "states", where $1 \le k < n$ Take a new sample from $X_m$: now you are in one of $km$ equally-probable states, let's say state $z$ (in some previously determined ordering), where $0 \le z < k m$. Let $km = q n + r$, $0 \le r < n$. If $z < q n$, return $\lfloor z/q \rfloor$. Otherwise you are in one of $r$ equally-likely states, and the process continues.

Let $T(k)$ be the expected number of samples of $X_m$ used in this process starting with $k$ states. Then $$ T(k) = 1 + \frac{r}{km} T(r),\ r \equiv km \mod n $$

Thus with $m=5$ and $n=7$ we get $$ \eqalign{T(1) &= 1 + \frac{5}{5} T(5)\cr T(5) &= 1 + \frac{4}{25} T(4)\cr T(4) &= 1 + \frac{6}{20} T(6)\cr T(6) &= 1 + \frac{2}{30} T(2)\cr T(2) &= 1 + \frac{3}{10} T(3)\cr T(3) &= 1 + \frac{1}{15} T(1)\cr}$$ and solving these we find that the expected number of samples needed is $T(1) = 1115/504$.

6
On

The following procedure for $X_5$ and $X_7$ generalizes.

Write the fractions $i\over7$ as “decimals” in base $5$.

0/7=0.000...
1/7=0.032412033...
2/7=0.120324122...
3/7=0.203241203...
4/7=0.241203241...
5/7=0.324120324...
6/7=0.412032412...

Generate the digits of a base-5 “decimal” number $x$ by choosing repeatedly from $X_5$. If the number you generate is between $k\over7$ and $k+1\over7$, your value from $X_7$ is $k$. You don’t often need to generate many digits to determine between which two fractions the number $x$ lies.

For example, if you drew $\color{blue}3$, then $\color{blue}0$, $x=0.\color{blue}{30}...$, and you can be certain your result from $X_7$ is $4$, because $${4\over7}=0.241203241_5\dots < 0.\color{blue}{30}000_5\dots < 0.\color{blue}{30}444..._5 < 0.324120324_5\dots={5\over7}.$$

After $d$ draws ($5^d$ possibilities), there are at most $6$ different draw patterns that leave you unsure of your result, so you fail to have an answer after $d$ draws with probability at most $6\over{5^d}$. So $4$ draws would be enough more than $99$% of the time.