Predictive probability inference

126 Views Asked by At

I'm quite new to probability theory, yet I'm reading on deep learning and trying to understand some basic concepts.

They write down Bayes rule in the following form:

$$ p(\theta|D) = \frac{p(\theta)p(D|\theta)}{p(D)} = \frac{p(\theta)p(D|\theta)}{\int_{\theta \in \Theta}p(D|\theta)p(\theta)d\theta}$$

where:

  • $\theta$ - parameters of the model
  • D - input data
  • $p(\theta)$ - prior probability
  • $p(D|\theta)$ - likelihood
  • $p(\theta|D)$ - posterior probability
  • $p(D) = \int_{\theta \in \Theta}p(D|\theta)p(\theta)d\theta$ - evidence

Now, they introduce what they call predictive probability which is denoted as $p(y|D)$ or $p(y|D,x)$ where y is the right answer for the next question x. Then they write: $$p(y|D) = \int_{\Theta}p(y|\theta)p(\theta|D)d\theta \propto \int_{\Theta}p(y|\theta)p(\theta)p(D|\theta)d\theta$$

At this point, I'm lost. Please could anybody explain in detail this marginalization?

1

There are 1 best solutions below

4
On

By marginalising the joint distribution you have $$ \begin{align} p(y|D) &= \int_{\Theta} p(y, \theta | D)\operatorname{d}\theta \\ &= \int_{\Theta} p(y|\theta, D)p(\theta |D)\operatorname{d}\theta, \end{align} $$ and since you seem happy with the claim that $p(\theta|D) \propto p(\theta)p(D|\theta)$, then it remains to claim that $$ p(y|\theta, D)=p(y|\theta), $$ which is an assumption, but it is one that is normally assumed true in the context of parameterised statistical models where the parameters, $\theta$, completely specify the conditional distribution. In the particular problem you are considering it amounts to the assumption that; "the distribution of the random variable $Y$ at the input point $x$ is completely specified by a parameter $\theta$" - that is all that you once you know a point $x$ and a parameter $\theta$ you know all you need to know about the distribution of $Y$, or in terms of the density functions that for any random variable $Z$ you have that $p(Y|x, \theta, Z) = p(Y|x, \theta)$.