This question is from the textbook "Introduction to Probability - Blitzstein & Hwang."
I was studying for a class when I came across an example problem that I solved, but got a slightly different result than the textbook. Here's the problem in question, paraphrased:
"Fred tests for a disease which afflicts 1% of the population. The test's accuracy is deemed 95%. He tests positive for the first test, but decides to get tested for a second time. Unfortunately, Fred also tests positive for the second test as well. Find the probability that Fred has the disease, given the evidence."
$\ $
My approach is as follows:
Let $D$ be the event that Fred has the disease, $T_1$ be the event that the first test result is positive, and $T_2$ be the event that the second test is also positive. We want to find $P(D\ |\ T_1,\ T_2)$.
We are also able to condition on $T_1$ (i.e. the event that the first test result is positive). This would give us:
$$P(D\ |\ T_1,\ T_2) = \frac{P(T_2\ |\ D,\ T_1)P(D\ |\ T_1)}{P(T_2\ |\ T_1)}$$
From my calculations:
$$P(T_2\ |\ D,\ T_1)\ =\ P(T_2\ |\ D)\ =\ 0.95$$ $$P(D\ |\ T_1)\ \approx\ 0.16$$ $$P(T_2\ |\ T_1)\ =\ \frac{P(T_1 ,\ T_2)}{P(T_1)}\ =\ \frac{P(T_1,\ T_2,\ D)\ +\ P(T_1,\ T_2,\ D^c)}{P(T_1,\ D)\ +\ P(T_1,\ D^c)}\ =\ \frac{0.0115}{0.059}\ \approx\ 0.19$$ $\ $ $$P(D\ |\ T_1,\ T_2)\ =\ \frac{0.95\ \times\ 0.16}{0.19}\ =\ 0.8$$
Therefore, I concluded that there is an 80% chance that Fred has the disease, given that both the first and second test results are positive.
$\ $
The problem is that the textbook has taken a different approach of using the odds form of Bayes' rule, which resulted in a conclusion slightly different from mine (0.78), and I'm having trouble understanding how that conclusion came to be.
$\ $
Textbook approach is as follows:
$$\frac{P(D\ |\ T_1,\ T_2)}{P(D^c\ |\ T_1,\ T_2)}\ =\ \frac{P(D)}{P(D^c)}\ \times\ \frac{P(T_1,\ T_2\ |\ D)}{P(T_1,\ T_2\ |\ D^c)}$$
$$=\ \frac{1}{99}\ \times\ \frac{0.95^2}{0.05^2}\ =\ \frac{361}{99}\ \approx\ 3.646$$
which "corresponds to a probability of 0.78."
$\ $
Here are the specific questions I have:
Is my approach wrong? A 0.02 difference is a pretty big difference.
How did the author derive the equation:
$$P(D\ |\ T_1,\ T_2)\ =\ P(D)P(T_1,\ T_2\ |\ D)$$
- What does the author mean when he/she says "3.646 corresponds to a probability of 0.78?"
$\ $
Any feedback is appreciated. Thank you!
I have some quibbles with the way the textbook sets up its question, beginning with the assumption that both positive and negative tests each have the same likelihood to be correct, and even more so the assumption that the outcomes of two tests on the same person are independent in probability. In real life, I would want to explore both of those points further before advising Fred. But let's ignore those objections for the sake of being able to compute something based on the given information, and assume each administration of the test has the same chance to give a correct result, even when two tests are administered one after the other on the same person.
The two methods are equivalent. The apparent discrepancy is due to roundoff.
The textbook finds an odds ratio of $361:99,$ which is exact (insofar as the $1\%$ and $95\%$ are exact). Since this is $P(D) : P(D^C),$ the probability is given by $$ P(D) = \frac{P(D)}{P(D) + P(D^C)} = \frac{361}{361 + 99} \approx 0.78478,$$ which the text rounds to $0.78.$ (Since you asked about this as a separate part of the question, I'll explain in more detail below.)
In your approach, $P(T_1,\ T_2,\ D)\ +\ P(T_1,\ T_2,\ D^c) = 0.0115$ is an exact result, and so is $P(T_1,\ D)\ +\ P(T_1,\ D^c) = 0.059,$ but $0.0115 / 0.059 \approx 0.19492.$ Meanwhile, $P(T_1 \mid D) \approx 0.16102.$ If we carry all these digits into the computation rather than rounding off to two places immediately, we find that $$P(D\mid T_1,\ T_2) = \frac{0.95\times 0.16102}{0.19492} \approx 0.78478.$$ That is, keeping five digits we get the same answer as the textbook method (if it retained five digits), and if we round to two digits only at the end (as the textbook does) we naturally would round the same way, to $0.78.$
I think an argument could be made for keeping only one digit of precision in the answer (how precise is that "$1\%$" anyway?), in which case both answers round to $0.8.$
They didn't. Instead, the fact is that $$P(D\mid T_1,\ T_2) = \frac{P(T_1,\ T_2\mid D)P(D)}{P(T_1,\ T_2)}$$ and $$P(D^C\mid T_1,\ T_2) = \frac{P(T_1,\ T_2\mid D^C)P(D^C)}{P(T_1,\ T_2)}.$$
When you compute the ratios of the two probabilities $$\frac{P(D\mid T_1,\ T_2)}{P(D^C\mid T_1,\ T_2)},$$ you get factors of $P(T_1,\ T_2)$ in both the numerator and the denominator, and these factors cancel each other.
As I hinted above, $3.646$ is an odds ratio; or as I would rather say, the odds ratio is $3.646 : 1.$ An odds ratio of $1:1$ corresponds to a $50\%$ chance, that is, each possibility is equally likely, whereas a $2:3$ odds ratio describes something that happens twice for every three times it does not happen. in general, if the probability of something is $p,$ its odds ratio is $p : (1 - p),$ that is, $\frac{p}{1 - p} : 1.$
If we say $p = P(D \mid T_1,\ T_2),$ then $P(D^C \mid T_1,\ T_2) = 1 - P(D \mid T_1,\ T_2) = 1 - p,$ and what the textbook has computed is that $$\frac{p}{1 - p} \approx 3.646,$$ that is, on average in situations like this, when both tests come up positive, there will be $3.646$ cases in which the tests were both correct for each case in which both tests were incorrect. That means there are $3.646$ accurate positives for every $3.646 + 1$ times the test comes out positive both times, which gives a probability of $$\frac{3.646}{3.646 + 1} \approx 0.78.$$ The way I worked the probability, however, was to take the fraction $$\frac{p}{1 - p} = \frac{361}{99}$$ and directly extract an odds ratio of $361:99$ from it. This means I can wait until the very end before doing any roundoff, but in other respects it's the same as the textbook's method. In both cases the odds are simply $kp : k(1 - p),$ where $k$ is whatever constant you have to multiply each side by in order to produce either $361:99$ or $3.646:1$ from the odds ratio $p : (1 - p).$