Naive Bayes: Conditional Independence vs. Marginal Independence Assumption

1.2k Views Asked by At

Apologies if this question would rather belong on Stat SE.

I am following Kevin Murphy's tutorial A brief introduction to Bayes' Rule. I seem to follow his derivations formally, but my intuition in understanding the derivation fails on the following question:

Say we ask "What is the likelihood that someone has a disease given that two tests for this disease yield a positive result?". So, for $D$ for "Disease: yes/no", and $T_1, T_2$ for "Test positive: yes/no":

(1) $P(D=1 \; \vert \; T_1 = 1, T_2 = 1)$.

By the chain rule and the definition of conditional probability, this is equivalent to:

(2) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1, T_1=1) \over P(T_1=1, T_2 =1)$

By the Naive Bayes assumption of conditional probability of $T_1, T_2$, given $D$, i.e. $T_1 \bot T_2 \; \vert \; D$, the third term of the numerator can be simplified and we get:

(3) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over P(T_1=1, T_2 =1)$

Correct so far? The denominator can be calculated by marginalization, so:

(4) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over P(D=1)P(T_1=1,T_2=1 \, \vert \, D=1)\,+\,P(D=0)P(T_1=1,T_2=1 \, \vert \, D=0)$

Applying once more $T_1 \bot T_2 \; \vert \; D$, we get:

(6) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over P(D=1)P(T_1=1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1)\,+\,P(D=0)P(T_1=1 \, \vert \, D=0)P(T_2=1 \, \vert \, D=0)$

Which matches the final step in the Bayes example of the tutorial, and it seems I can in principle follow the derivation up to here.

However, assume now that instead of (4) to (6) above, i.e. marginalization of $P(T_1, T_2)$, then applying conditional independence, we would continue from (3) above as follows, first by deriving the denominator by the chain rule:

(7) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over P(T_1=1)P(T_2 =1 \, \vert \, T_1=1)$

Followed by assumption of marginal independence between $T_1$ and $T_2$, i.e. $T_1 \bot T_2$, for:

(8) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over P(T_1=1)P(T_2 =1)$

Followed by marginalization, for:

(9) $P(D=1)P(T_1 = 1 \, \vert \, D=1)P(T_2=1 \, \vert \, D=1) \over [P(D=1)P(T_1=1 \, \vert \, D=1) \, + \, P(D=0)P(T_1=1 \, \vert \, D=0)] \; \times \; [P(D=1)P(T_2=1 \, \vert \, D=1) \, + \, P(D=0)P(T_2=1 \, \vert \, D=0)]$

Results (6) and (9) are not identical, my intuition fails what the difference between them is, and more specifically, why (6) is the correct result, and not (9).

Please note: I understand that conditional independence and marginal independence are independent of each other, as well as that my derivation of Naive Bayes is "wrong" in the sense that I am using the wrong kind of independence assumption for the denominator in step (8).

But it seems to me that, intuitively, marginal independence between $T_1$ and $T_2$ should be the "correct" assumption here, in the sense that the numerator expresses the likelihood of the intersection of events $T_1=1, T_2=1$, where we are (implicitly) assuming that these two tests do not "influence or cause each other". The latter is the reason why marginal independence intuitively seems the assumption that should apply here then.

Perhaps someone can explain either where I made a mistake in the above derivations, or why my intuition is wrong here, and why (intuitively) marginal independence for $T_1, T_2$ is not what we want here.

1

There are 1 best solutions below

7
On BEST ANSWER

You make the assumption that: $\mathsf P(T_1{=}1,T_2{=}1) = \mathsf P(T_1{=}1)\mathsf P(T_2{=}1)$

However, this is not justified.   The test results are not completely independent for the same individual.   They are independent for random samples from the population.   However, for a specific individual they will only be conditionally independent given the disease state.

Example: Imagine that I have a bag of coins containing only an equal number of double headed and double tailed coins.   If I were to select a coin, toss it, return it, and select another, then I can say the result of the second coin will be independent of the result of the first toss.   However, if I select a coin, toss it, obtain a head, then keep that coin to toss again, I assert that the result of the second toss will be very much dependent on what I obtained on the first toss.