Why do we need to use Bayes' Theorem for this question?

731 Views Asked by At

"A certain disease has an incidence rate of 2%. If the false negative rate is 10% and the false positive rate is 1%, compute the probability that a person who tests positive actually has the disease."

For this question, why do we need to use Bayes' Theorem? I'm having trouble understanding why the answer is not simply 99% (100% - 1%). If we know for sure that the person has tested positive (as stated in the problem), then can't we simply ignore incidence rate and false negative rate and only consider false positives? There is only a 1% chance of the test being false positive so there's a 99% chance that the test is correct.

2

There are 2 best solutions below

5
On

There is only a 1% chance of the test being false positive so there's a 99% chance that the test is correct.

No. A false positive is when the test is positive even though the person doesn't actually have the disease. The complement probability of a false positive is a true negative (i.e.: the test is negative and the person doesn't have the disease).

  • $1\%$ (the false positive rate) represents the probability that the test is incorrect given that the person doesn't have the disease.
  • $99\%$ (the true negative rate) represents the probability that the test is correct given that the person doesn't have the disease.

Neither of these two probabilities are what we want, since we don't know whether or not the person has the disease. All we know is that they tested positive (it could be a true positive or a false positive).


In general, there are four possibilities:

  • True Positive: They had the disease ($0.02$) and the test was positive ($0.9$).
  • False Negative: They had the disease ($0.02$) but the test was negative ($0.1$).
  • False Positive: They didn't have the disease ($0.98$) but the test was positive ($0.01$).
  • True Negative: They didn't have the disease ($0.98$) and the test was negative ($0.99$).

If a test is positive, then it could be a True Positive or a False Positive. We want to find the probability that, out of these two options, it was a True Positive. This yields: $$ \frac{\Pr[\textsf{True Positive}]}{\Pr[\textsf{True Positive or False Positive}]} = \frac{(0.02)(0.9)}{(0.02)(0.9) + (0.98)(0.01)} = \frac{90}{139} = 0.6474 \ldots $$

0
On

I find that a diagram can be helpful for understanding what’s going on.

enter image description here

The large square represents the total population. There are four combinations of outcomes, therefore we divvy it up into four regions. The area of each region represents the absolute, unconditional probabilty of the corresponding combination.

The horizontal boundary line divides those who have the disease—the white and red horizontal stripe above the line—from those who don’t have the disease—the blue/purple stripe below it. The area of the bottom stripe is proportional to the incidence rate. For your example, this would be $2\%$ of the total area (obviously exaggerated here).

The vertical line, on the other hand, reflects the test’s accuracy. To the left—the white/blue stripe—it gives the correct result; to the right—the red/purple stripe—an incorrect result. As with the incidence rate, the area of the left-hand stripe is proportional to the overall accuracy of the test. For your example, this would be $99\%$ of the width of the square, leaving a thin red/purple sliver on the right to represent incorrect results.

The red region thus represents false positives—the test result was positive but the patient doesn’t actually have the disease. One somewhat surprising result should be apparent from this diagram: the rarer the disease, the larger the number of false positives. Moving the horizontal boundary down increases the red area.

The areas of the four regions as a fraction of the entire area of the square represent the absolute, unconditional probabilities of the corresponding combinations. However, when computing conditional probabilities, we’re interested in the ratio of the area of a region to the area of some subset of the square instead. If we know that the test was positive, that eliminates the white and purple areas: the test result is, whether correct or not, negative for those regions. So, for the conditional probability that the patient has the disease given that the test was positive, we want the ratio of the blue region to whatever’s left, which is the union of the blue and red regions. That’s obviously going to be a lot more than the $1\%$ that you proposed. Bayes’ theorem gives you a concrete way to compute this relative area ratio.

Note that for simplicity this diagram assumes that the test’s accuracy is uniform, that is, that whether or not the test returns the correct result isn’t affected by whether or not the patient has the disease. If that’s not the case, then the four regions won’t be bounded by tidy continuous lines as I’ve drawn, but the basic ideas are unchanged: There are four regions that represent the four possible combinations, and the conditional probability that a patient has the disease given that the test was positive is the ratio of the area of the blue region to the sum of the areas of the blue and red regions.