Efficiently modify the combined probability of many independent events when variable values change

100 Views Asked by At

Question:

Let $C_1$ and $C_2$ be two events that are independent and not mutually exclusive that occur with different probabilities $p_1$ and $p_2$.

For these two events, I understand that:

  1. $$ P(C_1 \cup C_2) = P(C_1) + P(C_2) - P(C_1)*(C_2) = p_1 + p_2 - p_1*p_2$$ or

$$ P(C_1 \cup C_2) = 1 - P(\overline{C_1})*P(\overline{C_2}) = 1 - (1-p_1)*(1-p_2)$$

In addition, each probability $p_n$ is the product of two other probabilities $p_a$ and $p_{b,n}$, where $p_a$ is constant across events but $p_{b,n}$ differs between events, such that:

3. $$ P(C_1 \cup C_2) = 1 - P(\overline{C_1})*P(\overline{C_2}) = 1 - (1-p_{a}p_{b1})*(1-p_ap_{b2})$$

What is a generalized form for calculating the probability that at least one event occurs given $N$ independent events (rather than 2 events as presented in the example above), such that $$P(C1\ \cup\ ...\ \cup\ C_N) = 1-\prod_{j=1}^{J}{ (1 - p_\text{a}*p_\text{b, i}) } $$

How can $p_a$ be isolated?

Application:

I ask because I need to combine the probability of occurrence of many thousands of events (i.e., infections) for millions of individuals (e.g., trees) in an agent-based/individual-based simulation model, using the following equation: 4. $$ P(C_1 \cup C_2 \ \cup ... \cup \ C_N) = 1 - P(\overline{C_1})*P(\overline{C_2})* ... *P(\overline{C_N}) = 1 - (1-p_ap_{b1})*(1-p_ap_{b2})* ... *(1-p_ap_{bN})$$

The overall probability of at least one event occurring $P(C_1 \cup C_2 \ \cup ... \cup \ C_N)$, hereafter referred to as $P$, is calculated during every time step in the model.

Over time in the simulation, the value of $p_a$ will change (depending on changing state variables of the individual trees) while the values of $p_{bn}$ remain constant. Therefore, as the value of $p_a$ changes between time steps, the value of $P$ will need to be modified to reflect this change.

However, given the amount of values involved in the calculation (thousands of calculations for each of millions of individual trees), it is not practical to store each value of $p_{bn}$ for each individual (tree) once the (maybe not possible given the constraints of the software and computing resources I am using).

Rather than recalculate the overall probability from scratch, which requires lots of time and computing resources, I want to be able to modify the overall probability $P$ based on the change in the value of $p_a$.

1

There are 1 best solutions below

0
On

I found an approximation that is accurate enough for my application based on a publication by Ursini and Martins (2017). It works best for large $n$ and low probabilities (i.e., the closer to 0, the better).

For this approximation, one calculates the mean probability for each event in the union:

$$\bar{p} = \frac{1}{n} \sum_{i=1}^{n}{p_i}$$

Then, one calculates the approximation of the union as:

$$P = 1-(1-\bar{p})^n$$

This suits my purposes because it means that I only have to calculate the sum of $p_b$, simplifying the computation.