Independent sampling with at least two elements, without re-sampling

55 Views Asked by At

Suppose we have a sequence of numbers (WLOG, $[n]$) and we want to sample elements from it into a set $S$, where each element is sampled independently at random with $p_i := \Pr[i \in S] \in (0, 1), \forall i \in [n]$.

We do this multiple times, and each time we need to at least sample two nodes. Otherwise, we discard this sampling and re-sample. A naive way to implement this is to sample as usual sequentially, and re-sample from scratch if we end up with less than two elements. Are there better one-shot ways to do so without re-sampling?

1

There are 1 best solutions below

1
On BEST ANSWER

Intuitively you want to sample from the conditional distribution $S|\#S\geq 2$, but to do this directly you would need to calculate the probability of all $2^n$ possible outcomes for $S$, which is problematic for large $n$.

Another way to sample from this conditional distribution is to use Gibbs Sampling. The Gibbs Sampling procedure should look something like this:

Step 1: Choose an initial value for $S$. Could be $S = \{1,2\}$ or $S=\{1,\dots,n\}$.

Step 2: For each $i\in \{1,\dots,n\}$ do the following. If $S\setminus \{i\}$ contains only $1$ element, then $i$ must be included in $S$ with probability $1$. Otherwise $i$ is included in $S$ with probability $p_i$ (or removed from $S$ with probability $1-p_i$ if $i$ was already in $S$).

Step 3: Save the current value of $S$ in a list and repeat step 2 with the current value of $S$. Continue untill you have sufficiently many samples.

Note that the resulting list of samples will not be independent. You will however get a Markov chain, which converges to the desired distribution, and you will still be able to use results such as the law of large numbers to make inference.