Probability to be infected in series of meetings

78 Views Asked by At

I have difficulty interpreting a simple probabilistic calculation and would be happy to help. We are in the midst of the Covid-19 epidemic and I am trying to assess someone's probability to be infected in a time window. For simplicity, I guess the probability of getting infected during a single session depends on its duration alone.

In math terms:

The probability to be infected in a given meeting is $P(t) = t\cdot\Theta$, where $t$ is meeting duration and $\Theta$ is only a calibration parameter of the virus (and keep the probability under 1).

Hence, the probability to be infected in a series of meetings is the complementary event to be not-infected in any meeting: $P(t_1...t_N) = 1-\prod_{k=1}^N(1-t_k\cdot\Theta)\tag{*}$

Does this inference sound reasonable? It seems to me that this is the material of a basic course in probability.

Surprisingly and incomprehensibly to me, I saw in an article a calculation that actually reaches the following result for the same question (and the same assumptions):

$P(t_1...t_N) = 1- \exp(-\Theta\sum_{k=1}^{N}t_{k})\tag{**}$

I do not understand what the relationship is between $(*)$ and $(**)$ and how in the calculation of a complementary event they arrived at the equation $(**)$?

Thanks a lot!

1

There are 1 best solutions below

0
On

It is not actually true that the probability of getting infected in a meeting is a linear function of the length $t$ of the meeting; this is an approximation only valid when the product of the time and the infection probability is small. The simplest update to this linear model is the following: think about the problem in small time steps of length $\frac{1}{n}$, and suppose that in a small time step you have a small probability $\frac{\Theta}{n}$ of being infected, and that these infections are independent. Then the probability that you are infected after the entire meeting of length $t$ is $1$ minus the probability that you are not infected, which is

$$1 - \left( 1 - \frac{\Theta}{n} \right)^{nt}$$

and for large $n$ this approaches

$$1 - e^{- \Theta t}.$$

This says that the time to infection follows an exponential distribution. The same argument applies to a sequence of meetings (after all, a sequence of meetings is just one long meeting) and gives $1 - e^{\Theta \sum t_i}$. Really this should be $1 - e^{\sum \Theta_i t_i}$ as the risk-per-unit-time parameter $\Theta_i$ could differ dramatically between meetings also. I have not checked if this is what the article you link is doing.

This is only the next simplest model after the linear model (which it reduces to if $\Theta t$ is small), and very importantly it does not account for the fact that the risk-per-unit-time increases over time if people are indoors and expelling droplets into the air and the room is poorly ventilated. I think the risk-per-unit-time after an hour or two may actually be much higher than the risk-per-unit-time at the beginning of a meeting, but don't quote me on that. To account for this we should be computing something more like

$$1 - \exp \left( \int_0^{\sum t_i} \Theta(t) \, dt \right)$$

where the risk $\Theta(t)$ now varies over time; above we considered the special case that $\Theta(t)$ is constant or piecewise constant. But we would need more modeling assumptions to figure out a good choice of shape for $\Theta(t)$, then more data to estimate it.

For practical risk assessments I recommend the microCOVID project, which some of my friends were involved with, and which has been updated to take B117 into account.