In Naive Bayes classifier how is P(sneezing,builder|flu) = P(sneezing|flu)P(builder|flu)?

34 Views Asked by At

Please refer to this literature:

According to Naive Bayes classification algorithm:

$P(sneezing,builder|flu) = P(sneezing|flu)P(builder|flu) $

where sneezing and builder are independent events.

How do they arrive at the above conclusion mathematically?

Is it something like:

$P(sneezing,builder|flu)$ $=P(sneezing \cap builder| flu)$ $= \frac{P((sneezing \cap builder) \cap flu)}{P(flu)}$

1

There are 1 best solutions below

0
On

Let $S$, $B$, and $F$ denote "sneezing", "builder", and "flu" respectively, which I suppose are events in a discrete probability space. We are given that $S$ and $B$ are independent, which means that $$ \sum_{i\in S\cap B}p(i)=\sum_{s\in S}\sum_{b\in B}p(s)p(b), $$ where $p(x)$ is the probability assigned to the element $x$ of the sample space, a.k.a. the probability mass function.

Writing out $\mathbb P(S,B\mid F)$ explicitly as a fraction yields $$ \mathbb P(S,B\mid F)=\frac{\sum_{i\in S\cap B\cap F}p(i)}{\sum_{f\in F}p(f)}.\qquad (1) $$ Note that in order for this quantity to even make sense, we are assuming that the denominator is non-zero: in other words, $\mathbb P(F)>0$.

On the other hand, we can write out $\mathbb P(S\mid F)\mathbb P(B\mid F)$ explicitly as follows: $$ \mathbb P(S\mid F)\mathbb P(B\mid F)=\frac{\sum_{i\in S\cap F}p(i)}{\sum_{f\in F}p(f)}\cdot\frac{\sum_{j\in B\cap F}p(j)}{\sum_{f\in F}p(f)}.\qquad (2) $$ Your question is asking why $(1)$ and $(2)$ are equal. By multiplying through on the denominators, we see it is the same asking why the following two expressions are equal: $$ \sum_{i\in S\cap F}p(i)\sum_{j\in B\cap F}p(j)\overset{?}{=}\sum_{i\in S\cap B\cap F}p(i)\sum_{f\in F}p(f). $$ And in fact, when written out explicitly like this, we see that the hypothesis you have been given does not imply that $(1)$ and $(2)$ are equal!! Indeed, it is easy to cook up a counterexample on a small sample space where it is false (but I will leave this exercise to you since it is instructive to work out on your own...) In other words, there is no mathematical explanation of why $(1)$ and $(2)$ are equal given your assumption... although there is a non-mathematical explanation.

How can this be? On the webpage you linked to, they were deliberately vague when expressing their assumptions:

Probability theory says that if several factors don't depend on each other in any way, the probability of seeing them together is just the product of their probabilities.

This condition, applied to $S$ and $B$, is stronger than saying that $S$ and $B$ are independent. What they really are trying to say is that $S$ and $B$ are conditionally independent given any "reasonable" third event. Thus $(1)=(2)$ is an additional assumption that is made, but it is justified on intuitive grounds that independence of $S$ and $B$ is assumed not only on the original sample space, but also on each of the smaller sample spaces $F$ and its complement. That is the meaning of conditional independence in this case. (The general case involves a partition of the sample space into several smaller parts, depending on the values of a random variable, and imposing the condition that independence continues to hold on each of these smaller sample spaces.)