Bayes, two tests in a row

20.5k Views Asked by At

I came up with a standard Bayesian example as to point out my confusion.

There is an epidemic. A person has a probability $\frac{1}{100}$ to have the disease. The authorities decide to test the population, but the test is not completely reliable: the test generally gives $\frac{1}{110}$ people a positive result but given that you have the disease the probability of getting a positive result is $\frac{80}{100}$.

I am interested in what happens after a person takes another test, specifically how much more information we would gain.

Probability after one test

Let $D$ denote the event of having the disease, let $T$ denote event of a positive outcome of a test. If we are interested in finding $P(D|T)$ then we can just go and apply Bayes rule:

$$ P(D|T) = \frac{P(T|D)P(D)}{P(T)} = \frac{0.8 \times 0.01}{0.009} = 0.88 $$

This feels about right.

Probability after two tests

This is where I think I misunderstand Bayes rule somewhat. Let $TT$ denote the outcome of two positive tests. We are now interested in calculating;

$$ P(D|TT) = \frac{P(TT|D)P(D)}{P(TT)} $$

The prior $P(D)$ is still $\frac{1}{100}$. $P(TT|D)$ would now be $0.8 \times 0.8$ because the two test can be assumed to be independent.

But I seem to not know how to deal with $P(TT)$ ... it cannot be $\frac{1}{110} \times \frac{1}{110}$ because then;

$$ \frac{P(TT|D)P(D)}{P(TT)} = \frac{0.64 \times 0.01}{0.009^2} > 1 $$

What is the right approach to the two-test Bayesian case?

4

There are 4 best solutions below

4
On BEST ANSWER

As an aside, I believe the proper value for $P(D|T)$ is exactly $.88 = \frac{8}{10}\frac{1}{100}\frac{110}{1}$

We have $P(T)$, the probability of the test showing a positive regardless of disease state as $\frac{1}{110}$. This has to be the conditional probability of a positive given diseased plus the conditional probability of a positive given disease-free. In other words: $$ \begin{align} P(T) &= P(T\cap D) + P(T\cap \neg D)\\ &= P(T|D)P(D) + P(T|\neg D)P(\neg D)\\ \frac{1}{110} &=\frac{8}{10}\frac{1}{100} + P(T|\neg D)\frac{99}{100}\\ P(T|\neg D) &=\frac{2}{1815} \end{align} $$

Next: $$ \begin{align} P(TT) &= P(TT|D)P(D) + P(TT|\neg D)P(\neg D)\\ &= \frac{64}{100}\frac{1}{100} + \frac{4}{3294225}\frac{99}{100}\\ &=\frac{21087}{3294225} = \frac{213}{33275} \approx 0.006401202 \end{align} $$ Now $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &= \frac{64}{100}\frac{1}{100}\frac{33275}{213}\\ &= \frac{5324}{5325} \approx 0.999812207 \end{align} $$

So, after two tests, we are really sure this person is diseased.

Update

In general, though, with Bayesian estimation, one can use the previous posterior as the current prior-- see slides 3 and 4. This will follow through as well here. Let $P(D^*)$ be the new prior (after one test). Now we live back in one test world, as one test after one test is the same as two tests after no tests. So $P(D^*)$ is $0.88$ from above. $P(T|D^*)$ remains the same as does $P(T|\neg D^*)$. So, all we need is: $$ \begin{align} P(TT) &= P(T|D^*)P(D^*) + P(T|\neg D^*)P(\neg D^*)\\ &= 0.8\cdot.88 + \frac{2}{1815}\cdot0.12\\ &= \frac{426}{605} \approx 0.704132231 \end{align} $$

Note that $P(TT)$ in the $D^*$ world is much greater than $P(TT)$ in the $D$ world. It stands to reason since $TT$ in $D^*$ is actually $T$ (one test) after already knowing a positive test. $TT$ in $D$ is a priori two tests knowing nothing. Now, as per before: $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &=\frac{8}{10}\frac{88}{100}\frac{605}{426}\\ &=\frac{5324}{5325} \approx 0.999812207 \end{align} $$

2
On

You compute $P(TT)$ the same way you computed $P(T)$ - using the Law of total probability: $$P(TT)=P(TT|D)P(D)+P(TT|\neg D)P(\neg D)=0.8^2\times 0.01 + P(T|\neg D)^2\times 0.99$$

Alas, I cannot quite figure out what $P(T|\neg D)$ in your problem statement is.

0
On

This is an interesting one. It seems like you can't carry the independence across the conditions. What it means is, if you tested positive, then the next test is also more likely to be positive (can you explain why?).

Thus, to find $P(TT)$, you need to condition first, like sds did in his answer.

To find $P(T|\neg D)$, we can use $P(T) = 1/110$ and $P(T | D) = 0.8$ Then, $$P(T, \neg D) = P(T) - P(TD) = P(T) - P(T|D)P(D) \approx 0.00909 - 0.008 $$ et cetera.

Also, it can be discussed whether test errors are truly independent from one test to another on the same person (since they might depend on certain chemicals in the body, they are likely not), but this discussion is beyond this simple problem.

0
On

What is the conditional probability $\Pr[T \mid \bar D]$; that is, the probability of obtaining a single false positive? This is $$\Pr[T \mid \bar D] = \frac{\Pr[T \cap \bar D]}{\Pr[\bar D]} = \frac{\Pr[T] - \Pr[T \cap D]}{\frac{99}{100}} = \frac{\frac{1}{110} - \frac{1}{100}\frac{8}{10}}{\frac{99}{100}} = \frac{2}{1815}.$$ Then the probability of two successive false positives is $$\Pr[T_1 \cap T_2 \mid \bar D] = \Pr[T \mid \bar D]^2.$$ Therefore the unconditional probability of two positive tests is $$\Pr[T_1 \cap T_2] = \Pr[T_1 \cap T_2 \mid D]\Pr[D] + \Pr[T_1 \cap T_2 \mid \bar D]\Pr[\bar D]$$ and the desired probability is $$\Pr[D \mid (T_1 \cap T_2)] = \frac{\Pr[T \mid D]^2 \Pr[D]}{\Pr[T \cap T_2]}.$$