Simulating outcomes with impossible outcomes from conditional probabilities

60 Views Asked by At

Simplified context

Consider $n$ coins {$C_i$} with for each of them the probability $P_i$ of landing on head ($H$).

Let $X_i\hookrightarrow B(P_i)$ be the random variable representing the outcome of a coin $C_i$.

Let $S$ be the number of $H$ when tossing all the $n$ coins.

Problem

I have to simulate $N$ $S_k$, with small $P_i$, and intuitively I feel it should be less computationaly intensive to first simulate how many of the $N$ $S_k$ are not only tails ($S_k=0$).

What I tried

I considered that, because$$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$$ then $$P(X_i=H|S\geqslant1)=\frac{P(S\geqslant1|X_i=H)P(X_i=H)}{P(S\geqslant1)}=\frac{1\times P(X_i=H)}{P(S\geqslant1)}$$ And that $$P(S\geqslant1)=1-P(S=0)=1-\prod_{i=1}^n 1-P_i=P_{S\geqslant1}\iff(1)$$ hence $$P(X_i=H|S\geqslant1)=\frac{P_i}{1-\prod_{i=1}^n 1-P_i}=P_{i|S\geqslant1}$$

What I wanted to do then is:

  • determine if $S_k=0$ or $S_k\geqslant1$ (considering $(S_k\geqslant1)\hookrightarrow B(P_{S\geqslant1}))$
  • if $S_k\geqslant1$, draw $(X_1,...,X_n)$ with probability $P_{i|S\geqslant1}$ for each $C_i$, instead of $P_i$

This way, I would have to simulate the $(X_1,...,X_n)$ only $\approx E(S\geqslant1)=N\times P_{S\geqslant1}$ times, which with small $P_i$ should only be a small fraction of $N$ in view of $(1)$.

Problem bis

If I adopt this approach of simulating {$k$} for which $S_k\geqslant1$ and for them only drawing $(X_1,...,X_n)$ with probability $P_{i|S\geqslant1}$,

I still end up with some $S_k=0$, which I want to consider an impossible outcome,

when simply replacing $X_i\hookrightarrow B(P_i)$ by $X_i\hookrightarrow B(P_{i|S\geqslant1})$

Questions

I would like to know two things:

  1. Are my math regarding the conditional probabilities correct?
  2. Would it be right to redraw $(X_1,...,X_n)$ if $S_k=0$ till $S_k\geqslant1$, considering that I wanted to draw $(X_1,...,X_n)$ knowing that $S_k\geqslant1$?

Further note

I am a biologist (so first: apologies if the maths aren't the most correct while I know this should be easy, not my forte), writing simulations in R. I feel question 2. could be addressed by:

  • listing all the $2^n-1$ potential outcomes of $S_k\geqslant1$,
  • weighting them by $P_{i|S\geqslant1}$
  • Only drawing from these outcomes, thus avoiding $S_k=0$ without having to redraw (which feels dirty in terms of probabilities).

But then listing $2^n-1$ $n$-vectors sounds a bit hard on memory for large $n$ (e.g. $n$ ranging from $10^1$ to $10^4$), hence my trying to confirm if redrawing is ok when encountering a theoretically impossible outcome.

1

There are 1 best solutions below

0
On

It’s an interesting approach. It doesn’t work quite how you tried it, but it can be made to work.

The only thing that’s wrong in your approach is that conditional on there being at least one heads, the coins are no longer independent. (You don’t actually state that they’re unconditionally independent, but I’m assuming that you intended to imply that.) So you can’t just replace their probabilities in the Bernoulli trials, continue to simulate them independently and throw out the results with no heads; you need to simulate the joint distribution that they have conditional on there being at least one heads, and you’re of course right that you can’t do that by doing anything exponential in $n$ when $n$ is up to $10^4$.

Fortunately, this is nevertheless possible, and you can even extend it to all values of $S$, not just $0$.

We have

\begin{eqnarray} P(S=1) &=& \sum_iP_i\prod_{j\ne i}(1-P_j) \\ &=& \prod_j(1-P_j)\sum_i\frac{P_i}{1-P_i}\;. \end{eqnarray}

Note how I divided by $1-P_i$ to complete the product so it can be taken out of the sum, making the sum easy to compute.

The individual terms are the probabilities for the events $E_i$ that coin $i$ shows heads and the others don’t:

$$ P(E_i)=\frac{P_i}{1-P_i}\prod_j(1-P_j)\;. $$

You can normalize their sum to $1$ to get the conditional probabilities of the $E_i$ given $S=1$:

$$ P(E_i\mid S=1)=\frac{P_i}{1-P_i}\left(\sum_j\frac{P_j}{1-P_j}\right)^{-1}\;. $$

We can continue like this for higher $S$. We have

\begin{eqnarray} P(S=2) &=& \sum_{i\ne j}P_iP_j\prod_{k\ne i,j}(1-P_k) \\ &=& \prod_k(1-P_k)\sum_{i\ne j}\frac{P_i}{1-P_i}\frac{P_j}{1-P_j}\;. \end{eqnarray}

Again, the individual terms are the probabilities for the events $E_{ij}$ that coins $i$ and $j$ show heads and the others don’t,

$$ P(E_{ij})=\frac{P_i}{1-P_i}\frac{P_j}{1-P_j}\prod_k(1-P_k)\;, $$

and you can normalize them to get the probabilities conditional on $S=2$:

$$ P(E_{ij}\mid S=2)=\frac{P_i}{1-P_i}\frac{P_j}{1-P_j}\left(\sum_{k\ne l}\frac{P_k}{1-P_k}\frac{P_l}{1-P_l}\right)^{-1}\;. $$

Now, while the single sum was trivial to compute, here we have a double sum which for $n$ of the order of $10^4$ would already cost of the order of $10^8$ operations. This is still feasible, but for $S=3$ it would become a triple sum with of the order of $10^{12}$ operations, so we should try to avoid this. Fortunately Albert Girard solved this problem for us in $1629$; the result is known as Newton’s identities (because Newton rediscovered it almost $40$ years later). If $e_k$ is the sum of the products of all $s$-tuples of distinct variables $a_i$, and $p_k$ is the sum of their $k$-th powers, $p_k=\sum_ia_i^k$, then

$$ e_k=\frac1k\sum_{j=1}^k(-1)^{j-1}e_{k-j}p_j\;. $$

Thus we have $e_1=p_1$ and

\begin{eqnarray} e_2 &=& \frac12(e_1p_1-p_2) \\ &=& \frac12\left(p_1^2-p_2\right)\;. \end{eqnarray}

So you can recursively compute the $e_k$ from the $p_k$, and the $p_k$ are all single sums that you can compute in $O(n)$ time. (In your case $a_i=\frac{P_i}{1-P_i}$.)

The final ingredient to turn this into a feasible simulation algorithm is to apply the idea behind these identities also to the choice of the coins that show heads.

For $S=1$, the choice is easy to make in $O(\log n)$ time: Compute the partial sums $s_k=\sum_{i=1}^kP(E_i\mid S=1)$ (you only need to do that once, not per trial), draw a random number $u$ uniformly from $[0,1)$ and perform a binary search in the array of partial sums to find $i$ such that $s_{i-1}\le u\lt s_i$ (with $s_0=0$); then coin $i$ is the one that shows heads.

For $S=2$, we can do the same thing twice. We might get the same coin twice – but look at Newton’s identity for $e_2$ again: The term $p_1^2$ represents the experiment of drawing two coins according to the conditional probabilities for $S=1$, and subtracting $p_2$ represents discarding the cases where the same coin is drawn twice. So we’ll get the right distribution by doing that. The same approach works for $S=s\gt2$.

To summarize: You can find the complete distribution of $S$ in $O(n)$ time using Newton’s identities. Then in each trial you can draw a value $s$ of $S$ from that distribution and then keep drawing $s$ coins from the conditional distribution for $S=1$ until all $s$ are different. (You need to redraw all $s$ coins; if you only redraw the duplicates, you won’t get the right distribution.)

This should work well if none of the $P_i$ are much larger than the others. If they are, cases with $S\gt1$ might become inefficient because you’d often get results with multiple copies of the most frequent coins. In that case, you could just simulate the most frequent coins separately and apply the algorithm only to the remaining coins.

The algorithm also becomes very inefficient if the sum of the $P_i$ is of the order of $1$, because then we’d often have to draw large tuples that are very likely to contain duplicates. But of course in this case the entire approach is inefficient, since $P(S\ne0)$ is not small, so in this case you should just simulate the coins directly.

I hope I’ve included enough of the details to make this workable for you; if not, please feel free to ask.