The following question is from a past exam paper at my University to which I am having difficulty understanding it or come up with a formula for it:
Suppose that you have received 100 email messages 90 of which are genuine and 10 are spam. Suppose that your email server runs a spam filtering software that classified the above email messages as follows.
- Out of the 90 genuine messages 88 classified as genuine but 2 are wrongly flagged as spam.
- The probability of correct classification of a message is at least 97%.
Using the above data, please answer the three questions below.
- What is the largest possible number of spam messages (out of 10) that the software can incorrectly classify as genuine?
- What is the smallest possible probability that the spam filter flags an email message as spam provided that the message is indeed a spam?
- What is the smallest probability that an email message is spam provided that it is flagged as spam by the spam filter?
My reasoning:
We know that the spam filter has correct classification rate of at least 97%, therefore at least 3 messages out of 100 are misclassified. With this, we know that the largest possible number of genuine messages being misclassified is 3.
We know that 10 messages are indeed spam, and 3 messages might be misclassified, therefore we the smallest possible probability is 7/10 or 70%.
We know that 10 messages are indeed spam out of 100 total messages and 3 messages might be misclassified, therefore the smallest possible probability is 7/100 or 7%.
I am not sure if the above is correct or not due to how it is worded.
What would be the easiest way to find how to solve this when interpreting the text itself? And how could I solve it using conditional probabilities like Bayes?
This is how my lecturer answered the above question:
$$ 0.97 = {88 \over 88+x} \\ 88 + x = 90.72 \simeq 91\% \\ x = 91 - 88 = 3$$
$ 10 * 0.97 = 9.7$ which means that 9 out of 10 messages are spam, therefore at least 1 is indeed a spam.
If $97\%$ on the spam side, then $9$ will be real spam out of $11$ possible spam. If $97\%$ on the genuine side, then $10$ out of $12$ will be real spam.