Dependent sampling preserving individual probabilities

12 Views Asked by At

I am considering the following way of dependent sampling which preserves individual probabilities:

  • Input: Given $N$ items $[N]$, each associated with a probability $p_i$, $\forall i \in [N]$
  • Output: The output would be a subset $S \subseteq [N]$, with $\Pr[i \in S] = p_i$
  • We do the following:
    • Initialization: $V = [N]$ and $S = \emptyset$
    • In each round $t$, we sample a group of items $V_t = \{i_1, i_2, \ldots, i_k\} \subseteq V$ following some distribution; the simplest case is each $i$ is chosen independent at random with some probability $p$
    • We find $p_t = \min_{j \in V_t} p_{j}$ and toss a coin with success probability $p_t$
      • if success, we update $S \gets S \cup V_t$ and $V \gets V \setminus V_t$
      • if failure, we do not update $S$ but update $p_i \gets \frac{p_i - p_t}{1 - p_t}$ for each $i \in V_t$, so that the individual probability $\Pr[i \in S]$ is preserved as the original $p_i$ for each $i \in N$; we also remove each $i$ from $V$ with $p_i = 0$ after such updates, but keep the items with non-zero residual probability $p_i$
    • The process terminates when $V = \emptyset$, and then $S$ is output

Are there any existing tools for analyzing such a sampling process with dependence? I am especially interested in the probability of subsets being sampled together, e.g., $\Pr[\{1,2,3\} \subseteq S]$.

I have studied a special simple case with all the $p_i$'s being identical. When all the $p_i$'s are identical, we can study each subset without considering the other items. However, when $p_i$'s are different, the sampling order matters and it becomes much more complicated as I can see.