How to derive the posterior predictive distribution?

4.4k Views Asked by At

I often seen the posterior predictive distribution mentioned in the context of machine learning and bayesian inference. The definition is as follows:

$ p(D'|D) = \int_\theta p(D'|\theta)p(\theta|D)$

How/why does the integral on the right equal the probability distribution on the left? In other words, which laws of probability can I use to derive $p(D'|D)$ given the integral?

Edit - After further consideration, I think I am able to see much of the derivation. That is,

$p(D'|D) = \int_\theta p(D', \theta | D)$ via the law of total probability
$p(D'|D) = \int_\theta p(D' | D, \theta) * p(\theta | D)$ via the chain rule

But I don't understand why $D$ may be dropped from the list of conditioned variables belonging to the integral's first term.

2

There are 2 best solutions below

3
On BEST ANSWER

$p(D',\theta | D) = p(D' | \theta,D)p(\theta | D)$ is from Bayes rules, provided we have densities:

$p(D',\theta | D) = \frac{P(D', \theta, D)}{P(D)} = \frac{P(D'|\theta, D) P(\theta, D)}{P(D)} = P(D'|\theta, D) P(\theta | D)$.

Now integrate out the nuisance variable $\theta$ on both sides. Your formula also appears to have a Markov-type assumption $p(D'|\theta,D)=p(D'|\theta)$.

2
On

To show this one can follow a somewhat standard argument. In what follows, for notational convenience, I have replaced your "$D$"s with "$S$"s. By the law of total expectation (in terms of conditional expectation) and Fubini's theorem, applied to any bounded measurable function $f$ defined on the relevant sample space $\Omega$, we observe that $$ \eqalign{ \int_{\Omega}f(s^{'})p(s^{'}\mid s)\mathrm ds^{'}&=\mathbb E[f(S^{'})\mid S=s]=\mathbb E[E[f(S^{'})\mid \Theta\,,s]\mid S=s]\\&=\int_{\theta}\left(\int_{\Omega}f(s^{'})p(s^{'}\mid \theta\,,s) \mathrm ds^{'}\right)p(\theta\mid s)\mathrm d\theta \\&= \int_{\Omega}f(s^{'})\left(\int_{\theta}p(s^{'}\mid \theta\,,s)p(\theta\mid s)\mathrm d\theta\right) \mathrm ds^{'}} $$

Since the far l.h.s. is equal to the far r.h.s. for all bounded measurable functions, we conclude that $$ p(s^{'}\mid s)=\int_{\theta}p(s^{'}\mid \theta\,,s)p(\theta\mid s)\mathrm d\theta $$