Sampling a combination randomly

971 Views Asked by At

I want to sample a combination of $N$ elements (without replacement) from a list of $M$ elements where $M\gg N$. There are algorithms to do this when each element is picked with uniform probability. I want to do the same for the non-uniform case.

Let each specific element $i$ is associated with a positive constant $p_i$. Then, say for $N=2$, I want the probability of sampling the combination $\{i,j\}$ to be equal to $\frac{p_i+p_j}{Z}$, where $Z$ is the normalization constant, i.e. $Z=\sum_{k,l}p_k+p_l\;\;\forall k,l$.

Any hints, pointers for an efficient and unbiased sampling algorithm are much appreciated.

2

There are 2 best solutions below

3
On BEST ANSWER

In general, you want to pick $M$ distinct elements where the probability of result $A$ (a subset of $\{1,\ldots,N\}$ is $p(A) = \sum_{i \in A} p_i/Z$, $Z = \sum_A \sum_{i \in A} p_i = {N-1 \choose M-1} \sum_i p_i$. Let $S(A) = \sum_{i \in A} p_i$ and $S = \sum_{i=1}^N p_i$. Let ${\cal P}_k(B)$ denote the collection of all $k$-element subsets of $B$.

The probability of choosing item number $1$ is $$\eqalign{P(1) &= \sum_{A \in {\cal P}_{M-1}(\{2,\ldots,N\})} p(\{1\} \cup A) = \sum_{A \in {\cal P}_{M-1}(\{2,\ldots,N\})} \dfrac{p_1 + S(A)}{Z}\cr &= \dfrac{p_1}{S} + \sum_{A \in {\cal P}_{M-1}(\{2,\ldots,N\})} \sum_{j \in A} \dfrac{p_j}{{{N-1} \choose {M-1}} S} = \dfrac{p_1}{S} + \sum_{j=2}^N \dfrac{{{N-2} \choose {M-2}} p_j}{{{N-1} \choose {M-1}} S}\cr &= \dfrac{p_1}{S} + \dfrac{M-1}{N-1} \dfrac{S - p_1}{S} = \dfrac{N-M}{N-1} \dfrac{p_1}{S} + \dfrac{M-1}{N-1} } $$ You can use a sequential procedure: first decide (using this probability) whether or not to choose item $1$. Depending on whether you choose $1$ or not, you have $M-1$ or $M$ items to be chosen from the remaining $N-1$. The conditional probability of choosing $2$ given this first choice is then $$ \dfrac{\sum_{A \in {\cal P}_{M-2}(\{3,\ldots,N\})} p(\{1,2\} \cup A)}{P(1)} \ \text{or} \ \dfrac{\sum_{A \in {\cal P}_{M-1}(\{3,\ldots,N\})} p(\{2\} \cup A)}{1 - P(1)}$$ which can be obtained by a similar calculation. After deciding whether or not to choose $2$ using this conditional probability, you calculate the probability for $3$, and continue in this way until all $M$ items are chosen.

2
On

The normalization constant $Z=(M-1)\sum p_i$ as each element is involved in $M-1$ pairs. The probability that item $i$ is chosen is then $P(i)=\frac 1Z \sum_{k \neq i}(p_k+p_i)=\frac 1Z ((M-1)p_i+\frac Z{M-1}-p_i)=\frac 1{M-1}+\frac{(M-2)p_i}Z$ but as mjqxxxx pointed out, you can't just choose elements individually with this probability as the pairs will not be chosen with proper probability. I don't see anything better than listing all the pairs, calculate the probability of each, and pick on that basis. True, this is an M^2 process.