Predictive probability inference

130 Views Asked by Bumbble Comm At 11 May 2026 - 3:41

I'm quite new to probability theory, yet I'm reading on deep learning and trying to understand some basic concepts.

They write down Bayes rule in the following form:

$$ p(\theta|D) = \frac{p(\theta)p(D|\theta)}{p(D)} = \frac{p(\theta)p(D|\theta)}{\int_{\theta \in \Theta}p(D|\theta)p(\theta)d\theta}$$

where:

$\theta$ - parameters of the model
D - input data
$p(\theta)$ - prior probability
$p(D|\theta)$ - likelihood
$p(\theta|D)$ - posterior probability
$p(D) = \int_{\theta \in \Theta}p(D|\theta)p(\theta)d\theta$ - evidence

Now, they introduce what they call predictive probability which is denoted as $p(y|D)$ or $p(y|D,x)$ where y is the right answer for the next question x. Then they write: $$p(y|D) = \int_{\Theta}p(y|\theta)p(\theta|D)d\theta \propto \int_{\Theta}p(y|\theta)p(\theta)p(D|\theta)d\theta$$

At this point, I'm lost. Please could anybody explain in detail this marginalization?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 31 May 2018 - 6:38

By marginalising the joint distribution you have $$ \begin{align} p(y|D) &= \int_{\Theta} p(y, \theta | D)\operatorname{d}\theta \\ &= \int_{\Theta} p(y|\theta, D)p(\theta |D)\operatorname{d}\theta, \end{align} $$ and since you seem happy with the claim that $p(\theta|D) \propto p(\theta)p(D|\theta)$, then it remains to claim that $$ p(y|\theta, D)=p(y|\theta), $$ which is an assumption, but it is one that is normally assumed true in the context of parameterised statistical models where the parameters, $\theta$, completely specify the conditional distribution. In the particular problem you are considering it amounts to the assumption that; "the distribution of the random variable $Y$ at the input point $x$ is completely specified by a parameter $\theta$" - that is all that you once you know a point $x$ and a parameter $\theta$ you know all you need to know about the distribution of $Y$, or in terms of the density functions that for any random variable $Z$ you have that $p(Y|x, \theta, Z) = p(Y|x, \theta)$.

Predictive probability inference

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MACHINE-LEARNING

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions