I am trying to understand the derrivations associated with the Naive Bayes assumption. One document I read starts off by stating this:
\begin{align} P(x | C_k) &= P(x_1, x_2, \dots , x_D |C_k) \\ & = P(x_1 | x_2, \dots, x_D,C_k) P(x_2 | x_3, \dots, x_D,C_k) \dots P(x_{D−1} | x_D,C_k) P(x_D |C_k) \end{align} We can simplify things if we naively assume that the individual feature dimensions $(x_1, x_2, \dots, x_D)$ are independent, that is: $P(x_1 | x_2, x_3, \dots, x_D, C_k) = P(x_1 | C_k) $
I don't understand how the last equality in the quote is derived. I tried reverting back to the basic definition of conditional probability (ie. $P(A | B) = \frac{P(A, B)}{ P(B)}$, but for some reason could not prove that left side equals right side.
Can somebody please provide the steps for derriving the last line (ie. providing the left = right side) ?
Using a slight change in notation, the simplest version of the identity is $$ P(x,y\mid C)=P(x\mid y, C)P(y\mid C)\tag1 $$ which you can prove as follows: $$ P(x,y\mid C)=\frac{P(x,y,C)}{P(C)}=\frac{P(x,y,C)}{P(y,C)}\frac{P(y,C)}{P(C)} =P(x\mid y, C)P(y\mid C) $$ You can prove further iterations of this identity by applying (1) repeatedly. For example, the three-variable version proceeds as follows: $$ P(x,\color{red}{y,z}\mid C)=P(x \mid \color{red}{y,z},C)P(\color{red}{y,z}\mid C)\tag2 $$ by replacing $y$ in (1) with the pair $\color{red}{y,z}$. Then expand the second factor on the RHS of (2): $$ P(x,y,z\mid C)=P(x \mid y,z,C)\color{red}{P(y,z\mid C)} = P(x \mid y,z,C)\color{red}{P(y\mid z, C)P(z\mid C)} $$ by replacing $x,y$ in (1) with $y,z$ respectively. See if you can prove the four-variable version: $$ P(x,y,z,w\mid C)=P(x\mid y,z,w,C)P(y\mid z,w,C)P(z\mid w,C)P(w\mid C) $$