How to cheapest test N people for an infection?

74 Views Asked by At

Say you have $N$ people each having a probability of $p$ for being affected (and, $p+q=1$, a probability of $q$ for being not infected).

Say you have a test where you can combine the samples for any $n \leq N$ people and the test tells if any (at least, one) of the people is infected.

Looking for a testing scheme that tells us which people are infected and which are not, how would that be accomplished with the lowest number of tests?

To give an example what I'm talking about, one approach would be to do a collective test on all $N$ people, and if that is negative, we're done, with just the costs for one test. If not, we divide the people into subsets, testing each subset and then go on until each subset has cardinality 1.

Instinct tells me to use two groups of $N/2$ as subsets, but I don't see an elegant way to prove this, short of brute-force computation of every possibility.

Edit: as @Ross Millikan pointed out, the answer depends heavily on $p$. I had low values for $p$ in mind, lower than $0.1$ but I would like a general answer.

Edit2: as part of an answer (or a partly answer) I would like to see a way to formalize this question (which I as of now have difficulties with).

2

There are 2 best solutions below

2
On BEST ANSWER

In case $p \approx 0$ this is a known optimization problem, that was solved during The Second World War. The proposed procedure is the following:

Take n people, and devide them into $\frac{n}{k}$ groups (each group consists of k people). The groups' participants' blood is mixed and tested together, if the test comes out ot be false, than everyone in this group is not infected, otherwise test each groups' participant blood individually.

The expected number of tests taken is:

$\mathbb{E}(\# \ tests) = \frac{n}{k} \cdot \left( 1 \cdot (1 - p)^k + (k + 1) \cdot (1 - (1 - p)^k) \right) \to \min\limits_{k}$

If we set derivative of this function to 0 we won't get analytical solution, but we can approximate our function, using fact, that $(1 - p)^k \approx 1 - pk, \ p \approx 0$

So: $\mathbb{E}(\# \ tests) = n \cdot \left(1 + \frac{1}{k} - (1 - p)^k \right) \approx n \cdot (1 + \frac{1}{k} - 1 + pk) \to \min\limits_{k}$

$\mathbb{E}(\# \ tests)^{\prime}_{k} \approx n \cdot \left(\frac{1 + pk^2}{k} \right)^{\prime}_{k} = n \cdot \frac{2 pk^2 - 1 - pk^2}{k^2} = 0 \Rightarrow \boxed{k^{\ast} = \frac{1}{\sqrt{p}}}$

For $p = 0.01$ we have $\mathbb{E}(\# \ tests) \approx \frac{n}{5}$, so we get 5 times less tests.

3
On

The correct answer depends on what fraction of the people are infected. If most of them are, the best you can do is test one by one, using $N$ tests. If you test two together you will (almost certainly) get a positive result and the test is wasted.

If at most one person is infected, you do best with a binary search, using $2\log_2 N$ tests. Test each half, then each half of the one that came back positive, and so on.

If some intermediate fraction is infected, you get the most information when the probability a test comes back positive is $\frac 12$. Use your estimate of the infection rate to determine the first grouping, then update your estimate from how many of those come back positive.

Added regarding formalizing: if you know the infection rate $p$ you can ask what strategy gives the smallest expected number of tests to use. You can also ask about how to best revise an estimate of $p$ given a certain amount of combined testing.