Predictive distribution of SPGP

44 Views Asked by At

Eq(8) in Sparse Gaussian Processes using Pseudo-inputs states that

\begin{align*} "p(y^*|x^*,D,\bar{X})=\int{p(y^*|x^*,\bar{X},\bar{f})p(\bar{f}|D,\bar{X})d\bar{f}}" \end{align*}

which can be derived from

\begin{align*} p(y^*|x^*,D,\bar{X})&=\int{p(y^*,\bar{f}|x^*,D, \bar{X})d\bar{f}} \\&=\int{p(y^*|x^*,D, \bar{X},\bar{f})p(\bar{f}|x^*,D,\bar{X})d\bar{f}} \\&= \int{p(y^*|x^*,D, \bar{X},\bar{f})p(\bar{f}|D,\bar{X})d\bar{f}} \\&\stackrel{?}{=} \int{p(y^*|x^*, \bar{X},\bar{f})p(\bar{f}|D,\bar{X})d\bar{f}} \end{align*}

What I can't figure out is the last equation: why $D$ can be removed from conditions given pseudo data $\bar{X}$ and $\bar{f}$?

We know the predictive distribution $p(y^*|x^*,D)$ explicitly demonstrates the dependence of $y^*$ on $D$. Then why the given $\bar{X}$ and $\bar{f}$ can eliminates the effects of $D$?

Currently the only reason I can think of is the pseudo data $\bar{X}$ and $\bar{f}$ is "good" enough to represent $D$ "approximately", so the last equation should be "$\approx$" instead of "$=$".

1

There are 1 best solutions below

0
On BEST ANSWER

I've found relevant literature [1] which indeed states the assumption that

"Suppose now that $f_m$ is a sufficient statistic for the parameter $f$ in the sense that $z$ and $f$ are independent given $f_m$, i.e. it holds $p(z|f_m,f) = p(z|f_m)$".

which, in my case, can be interpreted as

"$\bar{f}$ is a sufficient statistic for $D$, so the equation holds $p(y^*|x^*,D,\bar{X},\bar{f})=p(y^*|x^*,\bar{X},\bar{f})$"