Sparse on-line Gaussian Processes derivation (Parameterization Lemma)

31 Views Asked by Bumbble Comm At 13 Apr 2026 - 3:34

In the reference Sparse on-line Gaussian Processes

Csató, Lehel; Opper, Manfred, Sparse on-line Gaussian processes, Neural Comput. 14, No. 3, 641-668 (2002). ZBL0987.62060.

The posterior distribution is given as

In Bayesian learning, all information about the parameters that we wish to infer is encoded in probability distributions (Bernardo & Smith, 1994). In the GP framework, the parameters are functions, and the GP priors specify a Gaussian distribution over a function space. The posterior process is entirely specified by all its finite-dimensional marginals. Hence, let $\boldsymbol{f} = \{f(x_1),...,f(x_M)\}$ be a set of function values such that $f_D \subseteq{f}$, where $f_D$ is the set of $f(x_i) = f_i$ with $x_i$ in the observed set of inputs, we compute the posterior distribution using the data likelihood together with the prior $p_0(\boldsymbol{f})$ as

$$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$

From this equation, the mean and kernel functions is given by using parameterization lemma on page 662,

$$ \langle f_x \rangle_{post} = \langle f_x \rangle _0 + \sum_{i=1}^{N} K_0(x,x_i)q_i\\ K_{post}(x,x^{'}) = \langle f_x \rangle _0 + \sum_{i,j=1}^{N} K_0(x,x_i)R(ij)K_0(x_j,x^{'}) $$

The parameterization lemma on page 662 calculates the $q_i$ as,

$$ q_i = \frac{ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)P(D|\boldsymbol{f}_D)} $$

When deriving the $q_i$ the numerator from the above equation, the author said he used the change of variable and the equation becomes

We can simplify the expression for qi by performing a change of variables in the numerator,$f^{'}_i=f_i-\langle f_i \rangle_0$, where $\langle f_i \rangle_0$ is the prior mean at $x_i$, and keeping all other variables unchanged $f^{'}_i=f_i, j \neq i$, leading to the numerator

$$ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i), $$

The questions are

Why when applying the bayesian rule to calulate the posterior equation is $$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$

not $$ p_{post}(\boldsymbol{f}|D) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D) \rangle_0} $$

Why the numerator of the $q_i$ equation $$ num(q_i) = { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} $$

becomes after the change of variable $f^{'}_i=f_i-\langle f_i \rangle_0$ $$ num(q_i) = \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i) $$

not $$ num(q_i) = \int d(\boldsymbol{f}^{'}_D + \langle \boldsymbol{f}_D \rangle) p_0(\boldsymbol{f^{'}}_D + \langle \boldsymbol{f}_D \rangle)\partial_iP(D|f^{'}_1+\langle f_1 \rangle_0,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i+\langle f_N \rangle_0) $$

I assume that the value $ \langle \boldsymbol{f}_D \rangle_0 $ is constant which is why the derivation becomes such. If so, why then $ \langle f_1 \rangle $ and $ \langle f_N \rangle_0 $ values are cancelled while $ \langle f_i \rangle_0 $ survived.

Thanks in advance.

Original Q&A

Sparse on-line Gaussian Processes derivation (Parameterization Lemma)

Related Questions in PROBABILITY-THEORY

Related Questions in STOCHASTIC-PROCESSES

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Related Questions in GAUSSIAN

Trending Questions

Popular # Hahtags

Popular Questions