Probabilities - spam filter

1k Views Asked by At

The following question is from a past exam paper at my University to which I am having difficulty understanding it or come up with a formula for it:

Suppose that you have received 100 email messages 90 of which are genuine and 10 are spam. Suppose that your email server runs a spam filtering software that classified the above email messages as follows.

  • Out of the 90 genuine messages 88 classified as genuine but 2 are wrongly flagged as spam.
  • The probability of correct classification of a message is at least 97%.

Using the above data, please answer the three questions below.

  1. What is the largest possible number of spam messages (out of 10) that the software can incorrectly classify as genuine?
  2. What is the smallest possible probability that the spam filter flags an email message as spam provided that the message is indeed a spam?
  3. What is the smallest probability that an email message is spam provided that it is flagged as spam by the spam filter?

My reasoning:

  1. We know that the spam filter has correct classification rate of at least 97%, therefore at least 3 messages out of 100 are misclassified. With this, we know that the largest possible number of genuine messages being misclassified is 3.

  2. We know that 10 messages are indeed spam, and 3 messages might be misclassified, therefore we the smallest possible probability is 7/10 or 70%.

  3. We know that 10 messages are indeed spam out of 100 total messages and 3 messages might be misclassified, therefore the smallest possible probability is 7/100 or 7%.

I am not sure if the above is correct or not due to how it is worded.

What would be the easiest way to find how to solve this when interpreting the text itself? And how could I solve it using conditional probabilities like Bayes?

3

There are 3 best solutions below

0
On BEST ANSWER

This is how my lecturer answered the above question:

  1. $$ 0.97 = {88 \over 88+x} \\ 88 + x = 90.72 \simeq 91\% \\ x = 91 - 88 = 3$$

  2. $ 10 * 0.97 = 9.7$ which means that 9 out of 10 messages are spam, therefore at least 1 is indeed a spam.

  3. If $97\%$ on the spam side, then $9$ will be real spam out of $11$ possible spam. If $97\%$ on the genuine side, then $10$ out of $12$ will be real spam.

1
On

I think that a reasonable solution is the following.

  1. out of 10 given spam messages, the maximum number of messages detected as spam is $10\times 0.97=0.3\approx 1$

  2. $\mathbb{P}(\text{Classified as Spam}|\text{Spam})=\frac{9.7}{10}=97\%$

  3. $\mathbb{P}(\text{Spam}|\text{Classified as Spam})=\frac{9.7}{11.7}\approx82.91\%$

This because the total message "Classified as Spam" by the filter are 2 (known) and at least 9.7 on 10 given spam

0
On

I think an unreasonable solution which tries to follow the wording of the question is

  1. Of the $10$ actual spam messages, the largest number that the filter can incorrectly classify as genuine is $10$, and the smallest number is $0$, since there are $10$ actual spam messages and the filter may be right or wrong on each of themmessages

  2. The smallest possible probability that the spam filter flags an email message as spam provided that the message is indeed a spam is just over $0.7$, since $0.90\times 1+0.1 \times 0.7=0.97$. This would happen if the probability a genuine message is classified as spam is very small (you were unlucky in the particular sample)

  3. The smallest probability that an email message is spam provided that it is flagged as spam by the spam filter is at least $\sum\limits_{n=0}^{10} \frac{n}{n+2}{10 \choose n}0.7^n(1-0.7)^{10-n} \approx 0.771$, using answers 1 and 2

I doubt this is what is expected