Sparse on-line Gaussian Processes derivation (Parameterization Lemma)

30 Views Asked by At

In the reference Sparse on-line Gaussian Processes

Csató, Lehel; Opper, Manfred, Sparse on-line Gaussian processes, Neural Comput. 14, No. 3, 641-668 (2002). ZBL0987.62060.

The posterior distribution is given as

  • In Bayesian learning, all information about the parameters that we wish to infer is encoded in probability distributions (Bernardo & Smith, 1994). In the GP framework, the parameters are functions, and the GP priors specify a Gaussian distribution over a function space. The posterior process is entirely specified by all its finite-dimensional marginals. Hence, let $\boldsymbol{f} = \{f(x_1),...,f(x_M)\}$ be a set of function values such that $f_D \subseteq{f}$, where $f_D$ is the set of $f(x_i) = f_i$ with $x_i$ in the observed set of inputs, we compute the posterior distribution using the data likelihood together with the prior $p_0(\boldsymbol{f})$ as

    $$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$

From this equation, the mean and kernel functions is given by using parameterization lemma on page 662,

$$ \langle f_x \rangle_{post} = \langle f_x \rangle _0 + \sum_{i=1}^{N} K_0(x,x_i)q_i\\ K_{post}(x,x^{'}) = \langle f_x \rangle _0 + \sum_{i,j=1}^{N} K_0(x,x_i)R(ij)K_0(x_j,x^{'}) $$

The parameterization lemma on page 662 calculates the $q_i$ as,

$$ q_i = \frac{ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)P(D|\boldsymbol{f}_D)} $$

When deriving the $q_i$ the numerator from the above equation, the author said he used the change of variable and the equation becomes

  • We can simplify the expression for qi by performing a change of variables in the numerator,$f^{'}_i=f_i-\langle f_i \rangle_0$, where $\langle f_i \rangle_0$ is the prior mean at $x_i$, and keeping all other variables unchanged $f^{'}_i=f_i, j \neq i$, leading to the numerator

    $$ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i), $$

The questions are

  1. Why when applying the bayesian rule to calulate the posterior equation is $$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$

not $$ p_{post}(\boldsymbol{f}|D) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D) \rangle_0} $$

  1. Why the numerator of the $q_i$ equation $$ num(q_i) = { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} $$

becomes after the change of variable $f^{'}_i=f_i-\langle f_i \rangle_0$ $$ num(q_i) = \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i) $$

not $$ num(q_i) = \int d(\boldsymbol{f}^{'}_D + \langle \boldsymbol{f}_D \rangle) p_0(\boldsymbol{f^{'}}_D + \langle \boldsymbol{f}_D \rangle)\partial_iP(D|f^{'}_1+\langle f_1 \rangle_0,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i+\langle f_N \rangle_0) $$

I assume that the value $ \langle \boldsymbol{f}_D \rangle_0 $ is constant which is why the derivation becomes such. If so, why then $ \langle f_1 \rangle $ and $ \langle f_N \rangle_0 $ values are cancelled while $ \langle f_i \rangle_0 $ survived.

Thanks in advance.