Question about the famous 'statistics in medicine' puzzle

167 Views Asked by At

Context of the question:

Given a test that predicts a disease with 99% accuracy. Furthermore it is know that one in 10.000 people suffer from this disease. If we assume that I test positive, then the question of the riddle is: "what are the odds that I am actually sick".

One can make a table like this:

$$ \begin{array}{c|lcr} & \text{Infected} & \text{Healthy} \\ \hline \text{Positive test} & \frac{1}{10000}\times\frac{99}{100} & \frac{9999}{10000}\times\frac{1}{100} \\ \text{Negative test} & \frac{1}{10000}\times\frac{1}{100} & \frac{9999}{10000}\times\frac{99}{100}\\ \end{array} $$

The answer of the riddle is: $$\frac{\frac{1}{10000}\times\frac{99}{100}}{\frac{1}{10000}\times\frac{99}{100}+\frac{9999}{10000}\times\frac{1}{100}}=\frac{1}{102}$$

Now here is my question:

If I test positive, then I am either infected, or I am healthy. There are no other options. Therefore I would think that their combined odds add up to one. Then how come that the odds of postive test/infected + positive test/healthy (the values of the first row) do not add up to 1?

Thanks!

4

There are 4 best solutions below

4
On BEST ANSWER

You are making 2 mistakes:

First, while you are right that if I test positive, then there is a 100% chance that I am either healthy or infected, but what that means is that $P(healthy|positive)+P(infected|positive)=1$, and you are describing this as $P(positive| healthy)$ + P(positive |infected)$

Second, you point to the top row as the values of $P(healthy|positive)$ and $P(infected|positive)$, but those are the values of $P(healthy \land positive)$ and $P(infected \land positive)$, and those will only add up to $P(positive)$, which is indeed not equal to $1$

Finally, note that what you calculated as $\frac{1}{102}$ is $P(infected|positive)$. So, use a similar method to compute $P(healthy|positive)$ (change the numerator to $\frac{9999}{10000} \times \frac{1}{100}$ to do this ... The denominator (which is $P(positive)$ stays the same), and you wil find that $P(healthy|positive)=\frac{101}{102}$, and now you see that they do add up to $1$.

In short: $P(positive| healthy)$, $P(healty|positive)$, and $P(positive \land healthy)$ are $3$ different things! (Though we do have that $P(positive \land healthy)=P(healthy \land positive)$)

2
On

The statement "If I test positive, then I am either infected, or I am healthy" corresponds to the probabilities of infected / positive + healthy / positive = 1; the event that you're conditioning on is "positive" not infected and healthy.

0
On

There’s a subtle but important difference between the probability of the event “tested positive and is infected” and the conditional probability of the event “is infected given that the test came back positive.” In the latter, you’re restricting your attention to that subset of patients who have tested positive. Within that restricted sample space, there are indeed only two possibilities, and the conditional probabilities sum to one as you expect. In the former, you’re also including patients who tested negative in the sample space, so there are four possible outcomes, not two. If you add up the probabilities in the table, you’ll find that they do add up to one as well.

0
On

Your assumption that any row or column in this table will add up to 1 is simply incorrect.

The table represents the probability space of the combinations of 2*2 events (sick/healthy & positive/negative) and should therefore add up to 1 as a whole. Which implies that a single row or column only adds up to 1 if the other rows/columns are all 0 (i.e. in the non-interesting case)

You might still use the table to make statements about a single row or column, but you must then normalise the value (divide it by the total of the column)

In gaining an intuition for this, it might be helpful to consider a sample-space rather than a probability-space. This can be done simply by dropping the divisor (10000) from your example, and your table comes to represent the division of a typical population over the 4 categories. (Because you no longer divide by 10000, it becomes even more clear that the expectation to add up to 1 cannot hold)

And to say it in yet another way: you ask

Then how come that the odds of postive test/infected + positive test/healthy (the values of the first row) do not add up to 1?

They add up to be the chance that you have a positive test