conditional probability about sum and product rule

3.8k Views Asked by At

I am reading Bishop's Pattern Recognition and Machine Learning.

In page 73, chapter 2.1. I can't understand the formula 2.19 :

$$p(x=1|\mathcal{D})=\int_0^1 p(x=1|\mu)p(\mu|\mathcal{D})\text{d}\mu $$

The author say, this is obtained by sum and product rules.

The sum rule is:

$$p(X) = \sum_Y p(X,Y)$$

and the product rule is: $$p(X,Y)=p(Y|X)p(X)$$

But from this, I can't deduce the formula. Could you help me ... thanks very much.

2

There are 2 best solutions below

0
On

First, lets deal with the sum rule: $p(X) = \sum_Y p(X,Y)$

  • Note that $(X,Y=y_i)\cap(X,Y=y_j)=\emptyset\;\;\forall (i\neq j)$
  • Therefore, the different values of $Y$ partition the joint distribution of $(X,Y)$.
  • The sum rule just says that if you've sliced up the probability of X according to which Y it occurs with, then to reconstitute the probability of X, just add up the probability of the slices.

The product rule just shows you how you convert a conditional probability to a joint probability.

Now, lets look at your integral: $p(x=1|\mathcal{D})=\int_0^1 p(x=1|\mu)p(\mu|\mathcal{D})\text{d}\mu$

Note that $p(x=1|\mu)p(\mu|\mathcal{D})$ is a conditional density (given $\mu$) multiplied by an unconditional density of $\mu$ (the $\mathcal{D}$ is just a set of possible events, not an event itself, so it doesn't count as a real conditioning event), hence the product rule says that this is equivalent to the joint density $p(\mu, x|\mathcal{D})$.

From the sum rule, we know that adding up a bunch of mutually exclusive joint events with a common element in each pair, this leads us to the probability of the common element. the intergral is the limit of a sum, hence you get the unconditional probability of $\mu$ give your $\sigma$-field $\mathcal{D}$

The "discrete" formulations your professor gave you can be turned into statements about densities just by taking limits as the events $\{X\in (a,b), Y\in (c,d)\}$ collapse around $x$ and $y$.

5
On

\begin{align} p(x|\mathcal{D})&\overset{(a)}=\int_0^1p(x,\mu|\mathcal{D})d\mu \\ &\overset{(b)}=\int_0^1p(x|\mu,\mathcal{D})p(\mu|\mathcal{D})d\mu \\ &\overset{(c)}=\int_0^1p(x|\mu)p(\mu|\mathcal{D})d\mu \end{align}

where (a) is application of the sum rule, (b) is application of the product rule, and (c) holds when $p(x|\mu,\mathcal{D}) = p(x|\mu)$, i.e., $x$ conditioned on $\mu$ is independent of $\mathcal{D}$.