For fun, I am trying to use a two-step stochastic process to simulate an existing distribution $\text{P}(x_i) = p_i > 0$ on some discrete set $x_1, ..., x_n$. The process is as follows.
- Pick an element $x_i$ with probability $1/n$.
- With some probability $d_i < 1$, "drop" $x_i$ and go back to step #1. Else, select $x_i$ as the "returned" element.
Define $\text{P}'(x_i)$ to be the probability that this process returns $x_i$. I want to know two things:
- what distributions $\text{P}$ can be matched using this stochastic process, and
- how do I characterize these distributions?
Specifically, I am seeking the necessary and sufficient conditions for a distribution $\text{P}$ such that some $d_i$ exist for which $\text{P}$ and $\text{P}'$ are the same distribution.
Edit: all math below this point is based on an incorrect expression for $\text{P}'(x_i)$.
I have made some progress. The probability of the stochastic process returning a particular element $x_i$ is the probability that it gets selected and returned in the first iteration; or that any $x_j$ gets selected ($j\neq i$), but then dropped, on the first iteration--and then $x_i$ gets selected and is returned; etc. This is expressed as follows.
$$\text{P}'(x_i) = \frac{1}{n}(1 - d_i)\sum_{k = 0}^\infty \left( \left(\frac{n-1}{n}\right) \sum_{j \neq i}d_j \right)^k$$
For this to converge to some probability, it must be true that $((n-1)/n)\sum_{j \neq i}d_j < 1$ (since the sum is an infinite geometric series). This puts a constraint on each $d_i$.
If we assume that the $d_i$ are picked to satisfy this constraint, then, for all $i$,
$$\text{P}'(x_i) = \frac{1}{n}(1-d_i)\left(1 - \frac{n-1}{n}\sum_{j\neq i}d_j\right)^{-1}.$$
If we set $\text{P}(x_i) = \text{P}'(x_i)$ for all $i$, then rearranging the above equation yields the following linear equations for each $d_i$:
$$\text{P}(x_i)(n-1)\sum_{j\neq i}d_j - d_i = n\text{P}(x_i) - 1.$$
Since we can solve each linear equation for $d_i$, we can get additional constraints $0 \leq d_i < 1$, and then solve for $\sum_{i\neq j}d_j$. This yields, for each $d_i$:
$$\frac{n - \frac{1}{\text{P}(x_i)}}{n - 1} \leq \sum_{i\neq j}d_j < \frac{n}{n - 1}.$$
From what I have done above, I see that I have a set of $n$ linear equations with $n$ unknowns--but I also have some additional inequalities to satisfy which I do not know how to handle. I am unsure how to proceed from here. Is this some convex optimization problem I'm unaware of? If anyone has any pointers, I would greatly appreciate it!
You can simulate any probability mass function $(p_1, ..., p_n)$ this way.
Fix $(p_1, ..., p_n)$ such that $p_i\geq 0$ for all $i$ and $\sum_{i=1}^n p_i = 1$. Define $p_{max} = \max_{i \in \{1, ..., n\}} p_i$. Fix $c$ as any real number that satisfies $c \geq p_{max}$ and note that $c>0$. Define:
$$ d_i = 1-p_i/c \quad, \forall i \in \{1, \ldots, n\} $$ Then clearly $d_i \in [0,1]$ for all $i \in \{1, \ldots, n\}$, so these are valid probabilities. Further, $d_i<1$ for at least one $i \in\{1, \ldots, n\}$ (since not all the $p_i$ can be zero).
We now show this choice of $(d_1, ..., d_n)$ works: Since $d_i<1$ for at least one $i \in \{1, ..., n\}$, there is always a positive probability of terminating on each given round. Since all rounds are independent, the number of rounds is geometrically distributed and so the rounds terminate with probability 1.
Fix $i \in \{1, \ldots, n\}$. Let $Q(i)$ be the probability that your process terminates on item $i$. Then \begin{align} Q(i) &= P[\mbox{pick item $i$ on round 1} | \mbox{keep what we pick on round 1}]\\ &=\frac{P[\mbox{keep what we pick on round 1}|\mbox{pick item $i$ on round 1}]P[\mbox{pick item $i$ on round 1]}}{P[\mbox{keep what we pick on round 1}]}\\ &= \frac{(1-d_i)(1/n)}{\sum_j (1/n)(1-d_j)}\\ &= \frac{1-d_i}{\sum_j (1-d_j)} \\ &= \frac{p_i/c}{\sum_j (p_j/c)}\\ &= p_i \end{align}
Examples
Choosing $c=p_{max}$ makes the process terminate the fastest.
1) $(p_1, p_2, p_3) = (1/3, 1/3, 1/3)$.
Let $c=p_{max} = 1/3$. Then $d_i=0$ for all $i$.
2) $(p_1,p_2,p_3) = (1/2, 1/3, 1/6)$.
Let $c=p_{max} = 1/2$. Then \begin{align} d_1 &=1 - (1/2)/(1/2)=0\\ d_2 &= 1-(1/3)/(1/2) = 1/3 \\ d_3 &= 1-(1/6)/(1/2) = 2/3 \end{align}