Probability of breast cancer

281 Views Asked by At

I'm having trouble with a probability problem I've been trying to solve for a while. It's about the accuracy of breast cancer testing. Relevant probabilities are listed below, where:

  1. "$\text{cancer}$" is the event "has breast cancer".
  2. "$+$" is the event "tests positive for breast cancer".

$P(\text{cancer}) = \frac{12}{1000}$

$P(+|\text{cancer}) = \frac{11}{12}$

$P(+) = \frac{31}{1000}$

$P(\text{cancer}|+) = 0.355$

This last line is a result from a previous problem. The next part involves updating the probability of having cancer, but I'm having trouble figuring out what the answer is.

In the next part of the question, there's a woman who has tested positive and her doctor says she's part of a population for which there's a 40% chance of breast cancer.

I need to find the probability that the woman has cancer.

I am confused by this update to the cancer probability, but i will assume that this means $P(\text{cancer})$ has changed.

I also assume this means I need to find a new value for $P(\text{cancer}|+)$, but I'm not getting this right.

$P(+ | \text{cancer}) = \frac{11}{12} = \frac{P(\text{cancer} | +)\cdot P(+)}{P(\text{cancer})} = \frac{P(\text{cancer} | +) \cdot \frac{31}{1000}}{0.40}$

$P(\text{cancer} | +) = \frac{11}{12} \cdot 0.40 \cdot \frac{1000}{31} = 11.828$

The result can't be correct because it's way over 1.

How can i fix this? Thank you in advance for any insight.

4

There are 4 best solutions below

3
On

It is natural to assume that in this case the previous value of $P(+)$ is not applicable (it is a very bed test, which has $3.1\%$ chances to be positive in a population where a real chance is $40\%$). Moreover, this contradicts the condition $P(+ | cancer) = \frac{11}{12}$, because then $P(+)$ is at least $\frac{11}{12}\cdot 0.4>0.031$.

It is natural to assume that values of $P(+|cancer)$ and $P(cancer|+)$ reflect efficiency and reliability of the test. I expect that a testing procedure deals with an isolated sample, so it is independend on cancer spread. But if we keep these values then the probability $P’(cancer|+)$ that the woman has a breast cancer is $P(cancer|+)$, and the information $P’(cancer)=0.4$ is redundant.

So we assume that the testing procedure provides only $P(+|cancer)$ and $P(+|\neg cancer)$. Then from the given probabilities we have

$$\frac{31}{1000}=P(+)=P(+|cancer)P(cancer)+ P(+|\neg cancer)P(\neg cancer)=$$ $$ \frac{11}{12}\cdot \frac{12}{1000}+ P(+|\neg cancer) \cdot \frac{988}{1000},$$

so $P(+|\neg cancer)=\frac 5{247}$.

Then

$$P’(+)=P(+|cancer)P’(cancer)+ P(+|\neg cancer)P’(\neg cancer)= \frac{11}{12}\cdot 0.4+\frac 5{247}\cdot 0.6=\frac {2807}{7410}.$$

Since $P’(cancer|+) P’(+)=P’(cancer\, \&\, +)= P’(+|cancer) P’(cancer),$ we have

$$P’(cancer|+)=\frac{ P’(+|cancer) P’(cancer)}{P’(+)}=\frac{\frac{11}{12}\cdot 0.4}{\frac {2807}{7410}}=\frac {2717}{2807}\approx 0.968.$$

0
On

By Bayes' Theorem:

$$P(cancer|+) = \frac{P(+|cancer) P(cancer)}{P(+)}$$

Where $$P(+) = P(+|cancer)P(cancer)+ P(+|no-cancer)P(no-cancer)$$ $$P(+|cancer) = \frac{11}{12}$$ $$P(cancer) = \frac{4}{10}$$ Therefore, $$P(cancer|+) = {\frac{11}{12} \cdot \frac{4}{10} \over \frac{11}{12} \cdot \frac{4}{10} + \frac{6}{10} \cdot P(+|no-cancer)}$$

To find the rate of false positives for the test, P(+|no-cancer), we can use the information from the general population, that $P(+)= \frac{31}{1000}$ and that $P(cancer) = \frac{12}{1000}$. Then, $$ \frac{31}{1000} = \frac{11}{12} \cdot \frac{12}{1000} + P(+|no-cancer) \cdot \frac{988}{1000} $$ Rearrange to get $$ P(+|no-cancer) = \frac{5}{247}$$

Plug this back into the previous equation to get $$P(cancer|+) = \frac{2717}{2807} \approx 0.968$$

0
On

Just to check the figures given early in your question:

\begin{align*} P(\text{cancer}|+) &= \frac{P(+|\text{cancer}) \cdot P(\text{cancer})}{P(+)} \\ &= \frac{\frac{11}{12} \cdot \frac{12}{1000}}{\frac{31}{1000}} \\ &= \frac{11}{31} \\ &\approx 0.355 \end{align*}

So at least that part is correct.

Now, what happens when you change $P(\text{cancer})$ to $0.40$, but keep $P(+|\text{cancer}) = \frac{11}{12}$ and $P(+) = \frac{31}{1000}$? You have already calculated this, but there is a more direct way of writing your calculation:

\begin{align*} P(\text{cancer}|+) &= \frac{P(+|\text{cancer}) \cdot P(\text{cancer})}{P(+)} \\ &= \frac{\frac{11}{12} \cdot 0.40}{\frac{31}{1000}} \\ &= \frac{11}{12} \cdot 0.40 \cdot \frac{1000}{31} \\ &\approx 11.828 \end{align*}

This calculation shows that this combination of probabilities does not work. This conclusion is correct, but why?

Here is a simple explanation: Increasing $P(\text{cancer})$ but keeping $P(+|\text{cancer})$ the same increases $P(\text{cancer} \cap +)$. In fact, in this case,

\begin{align*} P(\text{cancer} \cap +) &= P(\text{cancer}) \cdot P(+|\text{cancer}) \\ &= 0.40 \cdot \frac{11}{12} \\ &> \frac{1}{3} \\ &\gg \frac{31}{1000} \\ &= P(+) \end{align*}

So what should you do? It’s hard to tell from such a vague question. My guess is to keep $P(+|\text{cancer})$ and $P(+|\neg \text{cancer})$ the same as they were in the original situation (because these should depend only the test and therefore be independent of the cancer distribution):

\begin{align*} P(+|\text{cancer}) &= \frac{11}{12} \\ P(\neg \text{cancer}) &= 1 - P(\text{cancer}) \\ &= \frac{988}{1000} \\ P(\text{cancer} \cap +) &= P(\text{cancer}) \cdot P(+|\text{cancer}) \\ &= \frac{12}{1000} \cdot \frac{11}{12} \\ &= \frac{11}{1000} \\ P(\neg \text{cancer} \cap +) &= P(+) - P(\text{cancer} \cap +) \\ &= \frac{31}{1000} - \frac{11}{1000} \\ &= \frac{20}{1000} \\ P(+|\neg \text{cancer}) &= \frac{P(\neg \text{cancer} \cap +)}{P(\neg \text{cancer})} \\ &= \frac{\left( \frac{20}{1000} \right)}{\left( \frac{988}{1000} \right)} \\ &= \frac{5}{247} \end{align*}

(There’s a big red flag here: apparently, this test has performed the miracle of minimising both the false positive and false negative rates. But I see nothing better, so I will continue on my original path.)

Applying these values of $P(+|\text{cancer})$ and $P(+|\neg \text{cancer})$ to the new value of $P(\text{cancer}) = 0.40$ gives a new value of $P(+)$:

\begin{align*} P(\neg \text{cancer}) &= 1 - P(\text{cancer}) \\ &= 0.60 \\ P(\text{cancer} \cap +) &= P(\text{cancer}) \cdot P(+|\text{cancer}) \\ &= 0.40 \cdot \frac{11}{12} \\ &= \frac{11}{30} \\ P(\neg \text{cancer} \cap +) &= P(\neg \text{cancer}) \cdot P(+|\neg \text{cancer}) \\ &= 0.60 \cdot \frac{5}{247} \\ &= \frac{3}{247} \\ P(+) &= P(\text{cancer} \cap +) + P(\neg \text{cancer} \cap +) \\ &= \frac{11}{30} + \frac{3}{247} \\ &= \frac{2807}{7410} \end{align*}

Now we can apply Bayes’ theorem with this new value of $P(+)$:

\begin{align*} P(\text{cancer}|+) &= \frac{P(+|\text{cancer}) \cdot P(\text{cancer})}{P(+)} \\ &= \frac{\frac{11}{12} \cdot 0.40}{\frac{2807}{7410}} \\ &\approx 0.968 \end{align*}

Finally, let’s think about whether this answer is reasonable. Since our miracle test has a low rate of false results, we would expect:

$$P(\text{cancer}|+) \approx 1 \tag{1}$$

With the original value of $P(\text{cancer}) = \frac{12}{1000}$, the approximation $(1)$ was false, because $P(\neg \text{cancer})$ was so high that even a low false positive rate results in a high number of false positives (compared to the number of people tested) – in this case, higher than the number of false negatives.

But with the new value of $P(\text{cancer}) = 0.40$, the approximation $(1)$ is true. Or, at least, it’s a much better approximation than it was before.

So it looks like this answer is reasonable.

0
On

I'll try to make this easy, by "normalizing" things. Let's say there are $12000$ people (this number is going to make things be integers--for this first part anyway). Here's what we know:

  • $\frac{12}{1000}$ have cancer: $144$ people
  • 11 of 12 people with cancer test positive for cancer: $132$ positive tests and $12$ false negatives
  • $\frac{31}{1000}$ test positive for cancer: $372$ positives (means $240$ false positives)
  • Means $p(\text{cancer}|+) = \frac{132}{372} = \frac{11}{31} \approx 0.355$ (and this result is from Bayes' Theorem: $p(\text{cancer}|+) = \frac{p(+ | \text{cancer})p(\text{cancer})}{p(+)} = \left(\frac{11}{12}\cdot \frac{12}{1000}\right)\cdot\frac{1000}{31}$)

So now you're telling me you fall into a group that isn't the average ($1.2\%$) but a much larger group: $40\%$. There is one definite, real world, assumption we'll have to make and that is that the accuracy of the test doesn't change. That may not be the case in the real world. Different groups are likely to get different accuracies for a test.

So what do I mean by "same accuracy". There are two possible outcomes to a test:

  1. $\text{# positive results} = \text{# true positives} +\text{# false positives}$
  2. $\text{# negative results} = \text{# true negatives} + \text{# false negatives}$

The rates of false negatives and false positives should remain the same (the number of true negatives and true positives will be determined by the population). This is the assumption that the test will have the same accuracy.

So now let's proceed as above, except we need to find the number of positive tests instead of it being a given:

  • $\frac{4}{10}$ have cancer: $4800$ people
  • $11$ of $12$ people with cancer still test positive: $4400$ positive tests ($400$ false negatives--note the false negative rate remains the same through this assumption).

The next part, we need to predict how many false positives we'll get. Assuming it happens at the same rate. Now think about this: where do false positives come from? They come from people that should be negative. In the original example, there were $11856$ people that did not have cancer, of those tested, $240$ came back positive, so the false positive rate was: $\frac{240}{11856} = \frac{5}{247}$. This represents $p(+|\neg \text{cancer})$. This can be slightly tricky to solve (although the above should give some insight):

\begin{align*} p(+) = p(+|\text{cancer})p(\text{cancer}) + p(+|\neg\text{cancer})p(\neg\text{cancer})\\ \frac{31}{1000} = \frac{11}{12}\frac{12}{1000} + x*\left(1 - \frac{12}{1000}\right) \end{align*}

Let's just get rid of the $1000$:

\begin{align*} 31 = 11 + 1000x - 12x \leadsto 20 = 1000x - 12x &&\text{divide everything by 4}\\ 5 = 250x - 3x \leadsto x = \frac{5}{247} && \text{q.e.d.} \end{align*}

OK, back to the example. We know $\frac{5}{247}$ of the people without cancer will test (falsely) positive, so now we find how many there are:

$$ \frac{5}{247}\cdot 7200 \approx 145.75\ \text{false positives} $$

Giving the total number of positives at approximately $4400 + 145.75 = 4545.75$. So now we do just as we did above: we have $4400$ true positives out of $4545.75$ total positives:

$$ p(\text{cancer}|+) \approx \frac{4400}{4545.75} \approx 96.79\% $$

But you can see from how we found the false positive rate, how we would solve this using the probabilities:

\begin{align*} p(+) =&\ p(+|\text{cancer})p(\text{cancer}) + p(+|\neg\text{cancer})p(\neg\text{cancer}) \\ x =&\ \frac{11}{12}\frac{2}{5} + \frac{5}{247}\frac{3}{5} \\ =&\ \frac{11}{30} + \frac{3}{247} = \frac{2807}{7410} \end{align*}

And finally, using Bayes' Theorem again:

\begin{align*} p(\text{cancer}|+) =&\ \frac{p(+|\text{cancer})p(\text{cancer})}{p(+)} = \frac{\frac{11}{12}\cdot\frac{2}{5}}{\frac{2807}{7410}} \\ =&\ \frac{11}{30}\cdot\frac{7410}{2807} = \frac{11\cdot 247}{2807} \\ =&\ \frac{2717}{2807} \approx 96.79 \% \end{align*}

A note on False Positives/Negatives

If you think about a terminal (or rare) medical diagnosis, you might think that $p(\text{cancer}|+) = 35.5\%$ is an inaccurate test. But you can see that it's actually not (when you take mostly cancer patients, it identifies them with a much higher probability). The false positive rate is "high", but what we should actually care about is the false negative rate. Which if you calculate (from my normalized example), there were $12$ false negatives out of a total of $11,628$ negatives ($12000 - 372$ negatives) giving a false negative rate of $\approx 0.10\%$...

\begin{align*} p(\text{cancer}) =&\ p(\text{cancer}|+)p(+) + p(\text{cancer}|-)p(-) \\ p(\text{cancer}|-) =&\ \frac{p(\text{cancer}) - p(\text{cancer}|+)p(+)}{1 - p(+)} \\ =&\ \frac{\frac{12}{1000} - \frac{11}{31}\cdot\frac{31}{1000}}{\frac{969}{1000}}\\ =&\ \frac{1}{969} \approx 0.103199174407\% \end{align*}

This is important because it should be less dangerous to investigate further (false positive) than it is more dangerous to be wrong and let the disease continue (false negative).