How to derive $P(Y=y_{i}|X) = \frac{P(X=x_k|Y=y_i)P(Y=y_i)}{\sum_{j}P(X=x_k|Y=u_j)P(Y=y_j)}$

37 Views Asked by At

I am reading a material related to Naive Bayes Algorithm.

Assume that $Y$ is a boolean-valued random variable, and $X$ is a vector containing n boolean attributes. It claimes that $P(Y=y_{i}|X)$ can be represented as

$$\frac{P(X=x_k|Y=y_i)P(Y=y_i)}{\sum_{j}P(X=x_k|Y=y_j)P(Y=y_j)}$$

I think Bayes rule is involved here. But why $P(Y=y_{i}|X)$ is not represented as

$$\frac{P(X|Y=y_i)P(Y=y_i)}{P(X)}$$

I am particularly confused about the denominator.

I think the subscript $j$ denotes all possible values of $Y$.

1

There are 1 best solutions below

0
On BEST ANSWER

There are two ideas used here. One is Bayes' rule, as you already know. For two events $A, B$, we have $$\Pr[A \mid B] = \frac{\Pr[B \mid A]\Pr[A]}{\Pr[B]}.$$ The second idea is the law of total probability, namely $$\Pr[A] = \Pr[A \mid B]\Pr[B] + \Pr[A \mid \bar B]\Pr[\bar B],$$ where $\bar B$ is the complementary event of $B$ (so in particular, $\Pr[B \cap \bar B] = 0$ and $\Pr[B] + \Pr[\bar B] = 1$).

The law of total probability naturally extends to the case where we have a set of outcomes that partition the sample space, and where one has a discrete-valued random variable $Y$ with support $Y \in \{y_0, y_1, \ldots \}$, we have $$\Pr[A] = \sum_{i=0}^\infty \Pr[A \mid Y = y_i]\Pr[Y = y_i].$$