I often seen the posterior predictive distribution mentioned in the context of machine learning and bayesian inference. The definition is as follows:
$ p(D'|D) = \int_\theta p(D'|\theta)p(\theta|D)$
How/why does the integral on the right equal the probability distribution on the left? In other words, which laws of probability can I use to derive $p(D'|D)$ given the integral?
Edit - After further consideration, I think I am able to see much of the derivation. That is,
$p(D'|D) = \int_\theta p(D', \theta | D)$ via the law of total probability
$p(D'|D) = \int_\theta p(D' | D, \theta) * p(\theta | D)$ via the chain rule
But I don't understand why $D$ may be dropped from the list of conditioned variables belonging to the integral's first term.
$p(D',\theta | D) = p(D' | \theta,D)p(\theta | D)$ is from Bayes rules, provided we have densities:
$p(D',\theta | D) = \frac{P(D', \theta, D)}{P(D)} = \frac{P(D'|\theta, D) P(\theta, D)}{P(D)} = P(D'|\theta, D) P(\theta | D)$.
Now integrate out the nuisance variable $\theta$ on both sides. Your formula also appears to have a Markov-type assumption $p(D'|\theta,D)=p(D'|\theta)$.