Conditional independence in posterior predictive distribution

814 Views Asked by At

The derivation of posterior predictive distribution has the following steps -

$\begin{split}p(\tilde y\mid y) = &\int p(\tilde y, \theta\mid y)~\mathsf d\theta \\ = &\int p(\tilde y\mid\theta, y)\,p(\theta\mid y)~\mathsf d\theta \\ = &\int p(\tilde y\mid\theta)\,p(\theta\mid y)~\mathsf d\theta\end{split}$

where

  • $\tilde y$ : new data for prediction
  • $y$ : observed data
  • $\theta$ : unknown parameter.

$p(\tilde y\mid\theta, y)$ reduces to $p(\tilde y\mid\theta)$ due to conditional independence.

Can you explain why this is the case? The way I have convinced myself is that given $\tilde y$ is conditioned on $\theta$, the aspects of observed data $y$ is captured through $\theta$ and given $\tilde y\mid θ$ and $y$ are independent we can drop $y$ from the equation.

Is there a more formal explanation for this? Also why doesn't this equation reduce to $p(\tilde y\mid θ) \cdot p(y)$?

2

There are 2 best solutions below

8
On BEST ANSWER

The unknown parameter $\theta$ is selected such that the prior and posterior data will be independent when its value is given.   That is such that $p(y,\tilde y\mid\theta)=p(y\mid \theta)\,p(\tilde y\mid\theta)$.

So, adding a few steps:

$\begin{align}p(\tilde y\mid y) = &\int p(\tilde y, \theta\mid y)~\mathsf d\theta &&\textsf{Law of Total Probability}\\[1ex] = &\int p(\tilde y\mid\theta, y)\,p(\theta\mid y)~\mathsf d\theta&&\textsf{Definition of Conditional Probability} \\[1ex]=&\int\dfrac{p(y,\tilde y\mid\theta)\;p(\theta\mid y)}{p(y\mid\theta)\hspace{10ex}}~\mathsf d\theta&&\textsf{Definition of Conditional Probability}\\[1ex]=& \int \dfrac{p(y\mid\theta)\,p(\tilde y\mid\theta)\;p(\theta\mid y)}{p(y\mid\theta)\hspace{16ex}}~\mathsf d\theta&&\text{Via the Conditional Independence} \\[1ex] = &\int p(\tilde y\mid\theta)\,p(\theta\mid y)~\mathsf d\theta && \text{Canceling the common factor.}\end{align}$

1
On

The 1st equality is true since through this fact: if $B_{1},B_{2},...,B_{n}$ partition A separately and independently then $P(A)=P(A\cap B_{1})+P(A\cap B_{2})+...+P(A\cap B_{n})$. The 2nd equality is the same conditioning through $P(x,y)=P(x)P(y|x)$ and the 3rd one is a Markov chain as ~y$\to\theta\to y$