How to efficiently simulate successes of several trials if probabilities are inhomogeneous

202 Views Asked by At

If I'm doing a simulation with $n$ trials, each with probability $p$, a quick way to select the successful trials is to choose a binomially distributed random number. Then randomly choose that many trials to be successful.

But what if the probabilities are distinct? I want to - as efficiently as possible - do a simulation in which I select the events that are successful. So I'd like to avoid generating a random number for each trial.

To make things specific, let's assume I've got $p_1$, ... , $p_{20}$ with each probability being some small number, say no bigger than $0.05$. I'd like to generate the successes without 20 coin flips.

Any suggestions? I know one way to do it if I order the probabilities and use a rejection sampling approach, but it would be great if I could avoid the cost of ordering them.

2

There are 2 best solutions below

0
On

Here's a suggestion:

Lets say you have $n$ trials ($T_i$), where $p_i = P(T_i=1), q_i=1-p_i$. Then there are $2^n$ combinations of successful trials (e.g., $(1,0,0,1)$ is one such combination if $n=4$).

  1. Set an $2^n$-vector, $\mathbf{P}:=(0,0,...0)$
  2. Let $I:=\{0, 1,2,...2^n-1\}$ be a set of integers
  3. For each $i\in I$, calculate its binary expansion as an n-vector $\mathbf{B}_i$ and
  4. Calculate $P(\mathbf{B}_i)=\prod\limits_{j=1}^n[p_iB_{ij}+q_i(1-B_{ij})]$
  5. Set $P_i=P(\mathbf{B}_i)$

After doing this, you will have a discrete probability distribution over $i\in I:\mathbf{P}(i)$

Now, to simulate successes, you just do the following:

  1. Let $X\sim \mathbf{P}$ and draw an integer from it.
  2. The successful trials will be the random vector $\mathbf{B}_X$

So, I've substituted some up-front computation to reduce your problem to a single-variable simulation.

0
On

You can efficiently generate the number of successful trials via the Poisson-Binomial distribution.

The PMF need only be calculated once for a given set of $P_i$, and variates generated by the method of your choice - using simple inverse sampling I generated >5 million / second on an old atom-based netbook.

Once you have the number of successes (or a sampling of them), you can use your alluded to method to select the actual members of the trial vector that were the successes. You might also have a peek at example 22.10.2 in Numerical Analysis for Statisticians by Lange.