Interpreting the results of a Naive Bayes classifier.

1.2k Views Asked by At

Using the Naive Bayes formula to classify text I have something like...

$$ P(Cat|Word1) = \frac{P(Word1|Cat) * P(Cat)}{P(Word1)} $$

Using a small example ...

Cat1 = 4 documents
     =  1x word 'Hello'
     =  3x word 'World'

Cat2 = 10 documents
     = 10x word 'Hello'
     =  1x word 'World'

Total of 14 docs. with 2 'categories'

I can then calculate the probability of Cat1 and Cat2

$$ P(Cat1|Hello,World) = \frac{P(Hello|Cat1) * P(World|Cat1) * P(Cat1)}{P(Hello) * P(World)} $$

For category 1

$$ P(Cat1|Hello,World) = \frac{\frac{1}{4} * \frac{4}{4} * \frac{4}{14}}{\frac{11}{14} * \frac{4}{14}} \approx 0.31818 $$

And category 2

$$ P(Cat2|Hello,World) = \frac{\frac{10}{10} * \frac{1}{10} * \frac{11}{14}}{\frac{11}{14} * \frac{4}{14}} = 0.35 $$

But I am struggling to interpret the values been returned,

  • Does it mean that there is a 31% chance of category 1 and 35% chance of category 2?
  • Does it mean that there is a slightly better chance of category 1 vs category 2
  • How much more likely is one category over the other?

How can I interpret the actual values been returned?

1

There are 1 best solutions below

7
On

So Bayes Theorem says: $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$ And the Naive Bayes assumes the class conditional $P(B|A)$ is independent so you can have $P(B|A) = \prod P(b_i|A)$

However, the example you give is not Naive Bayes, because you give the exact data and that data does not seem to satisfy the independent class conditional assumption. $P(Hello,World|Cat1) = 0$ but $P(Hello|Cat1)*P(World|Cat1) = 1/4 * 3/4 = 3/16$, clear the 2 are not the same.

Nevertheless if you just use the Bayes Theorem, you should be able to get to the right probability.

Assume your data is: Cat1 = [(World) $\times$ 3, (Hello)]; Cat2 = [(Hello) $\times$ 9, (Hello, World)]

Then use Bayes Theorem (not naive bayes):

$P(Cat1|Hello,World) = \frac{P(Hello,World|Cat1)*P(Cat1)}{P(Hello,World)} = \frac{0*\frac{4}{14}}{\frac{1}{14}} = 0$

(Since we know the exact distribution, we can get $P(Hello,World|Cat1)=0$ and $P(Hello,World)=\frac{1}{14}$)

$P(Cat2|Hello, World) = \frac{P(Hello,World|Cat2)*P(Cat2)}{P(Hello,World)} = \frac{\frac{1}{10}*\frac{10}{14}}{\frac{1}{14}}=1$

So in this case, yes

  • there is 0% chance of Cat1 and 100% chance of Cat2

  • it means it got to be in Cat2

  • 100% more likely

You can interpret the same way in Naive Bayes, but when you use real data to describe the distribution, please check the assumptions.