I am reading Bishop's Pattern Recognition and Machine Learning.
In page 73, chapter 2.1. I can't understand the formula 2.19 :
$$p(x=1|\mathcal{D})=\int_0^1 p(x=1|\mu)p(\mu|\mathcal{D})\text{d}\mu $$
The author say, this is obtained by sum and product rules.
The sum rule is:
$$p(X) = \sum_Y p(X,Y)$$
and the product rule is: $$p(X,Y)=p(Y|X)p(X)$$
But from this, I can't deduce the formula. Could you help me ... thanks very much.
First, lets deal with the sum rule: $p(X) = \sum_Y p(X,Y)$
The product rule just shows you how you convert a conditional probability to a joint probability.
Now, lets look at your integral: $p(x=1|\mathcal{D})=\int_0^1 p(x=1|\mu)p(\mu|\mathcal{D})\text{d}\mu$
Note that $p(x=1|\mu)p(\mu|\mathcal{D})$ is a conditional density (given $\mu$) multiplied by an unconditional density of $\mu$ (the $\mathcal{D}$ is just a set of possible events, not an event itself, so it doesn't count as a real conditioning event), hence the product rule says that this is equivalent to the joint density $p(\mu, x|\mathcal{D})$.
From the sum rule, we know that adding up a bunch of mutually exclusive joint events with a common element in each pair, this leads us to the probability of the common element. the intergral is the limit of a sum, hence you get the unconditional probability of $\mu$ give your $\sigma$-field $\mathcal{D}$
The "discrete" formulations your professor gave you can be turned into statements about densities just by taking limits as the events $\{X\in (a,b), Y\in (c,d)\}$ collapse around $x$ and $y$.