Interpretable Explanation of Conditional Probability Problem

41 Views Asked by At

I've given a question to students of mine, conditional probability.

Essential information is we have an email filter that filters 99.9% of junk emails to the spam folder, but it also filters 7% of legitimate emails there as well. If 50% of emails are junk and 50% are legit, how many emails in a spam folder containing 500 emails can we expect to be legitimate?

I did the work, it comes out to be roughly 32.74, using Bayes theorem, and the calculation speaks for itself, but I hate not being able to give an interpretable reason for why, beyond the formula, the answer makes sense. A few students, of course, answered 35, because that's an interpretable answer.

I gave them the classic Blue/Green taxi-cab witness problem already, which is easier to explain. Look at how many Blue cars he'd call Green vs how many Green cars he'd call Green, then put your "good" outcomes over the total outcomes, where the only valid ones are the ones where the witness says he saw a green taxi. But a similarly plain and interpretable result for the spam email folder is hard for me to find.

2

There are 2 best solutions below

1
On

Note sure that this is an improvement over Bayes but:

Say $2N$ was the total number of emails received. Then we have (on average) $N$ junk mails and $N$ legitimate emails. And the spam filter will contain $.999N+.07N$ emails. Thus we want $$(.999+.07)N=500\implies N\approx 468$$

It follows that the spam folder contains about $.07\times 468=32.76$ legitimate emails.

1
On

Working with $10000$ mails for ease of keeping calculations in integers.

A) Expected number of junk mails $= 5000$

Filtered to spam folder $ = 4995$ $\, \, (99.9\%)$

Comes to Inbox $ = 5 \, \, (0.1\%)$

B) Expected number of legit mails $= 5000$

Filtered to spam folder $ = 350$ $\, (7\%)$

Came to Inbox $ = 4650$ $\, \, (93\%)$

So total number of mails in spam folder $ = 5345$

Legit mails out of those $ = 350$.

If $500$ mails in spam folder, expected number of legit mails = $\displaystyle \frac{350 \times 500}{5345} \approx 32.74$