In the reference Sparse on-line Gaussian Processes
Csató, Lehel; Opper, Manfred, Sparse on-line Gaussian processes, Neural Comput. 14, No. 3, 641-668 (2002). ZBL0987.62060.
The posterior distribution is given as
In Bayesian learning, all information about the parameters that we wish to infer is encoded in probability distributions (Bernardo & Smith, 1994). In the GP framework, the parameters are functions, and the GP priors specify a Gaussian distribution over a function space. The posterior process is entirely specified by all its finite-dimensional marginals. Hence, let $\boldsymbol{f} = \{f(x_1),...,f(x_M)\}$ be a set of function values such that $f_D \subseteq{f}$, where $f_D$ is the set of $f(x_i) = f_i$ with $x_i$ in the observed set of inputs, we compute the posterior distribution using the data likelihood together with the prior $p_0(\boldsymbol{f})$ as
$$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$
From this equation, the mean and kernel functions is given by using parameterization lemma on page 662,
$$ \langle f_x \rangle_{post} = \langle f_x \rangle _0 + \sum_{i=1}^{N} K_0(x,x_i)q_i\\ K_{post}(x,x^{'}) = \langle f_x \rangle _0 + \sum_{i,j=1}^{N} K_0(x,x_i)R(ij)K_0(x_j,x^{'}) $$
The parameterization lemma on page 662 calculates the $q_i$ as,
$$ q_i = \frac{ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)P(D|\boldsymbol{f}_D)} $$
When deriving the $q_i$ the numerator from the above equation, the author said he used the change of variable and the equation becomes
We can simplify the expression for qi by performing a change of variables in the numerator,$f^{'}_i=f_i-\langle f_i \rangle_0$, where $\langle f_i \rangle_0$ is the prior mean at $x_i$, and keeping all other variables unchanged $f^{'}_i=f_i, j \neq i$, leading to the numerator
$$ \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i), $$
The questions are
- Why when applying the bayesian rule to calulate the posterior equation is $$ p_{post}(\boldsymbol{f}) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D|\boldsymbol{f}_D) \rangle_0} $$
not $$ p_{post}(\boldsymbol{f}|D) = \frac { P(D|\boldsymbol{f})p_0(\boldsymbol{f})} { \langle P(D) \rangle_0} $$
- Why the numerator of the $q_i$ equation $$ num(q_i) = { \int d \boldsymbol{f}_Dp_0(\boldsymbol{f}_D)\partial_iP(D|\boldsymbol{f}_D)} $$
becomes after the change of variable $f^{'}_i=f_i-\langle f_i \rangle_0$ $$ num(q_i) = \int d \boldsymbol{f}_Dp_0(\boldsymbol{f^{'}}_D)\partial_iP(D|f^{'}_1,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i) $$
not $$ num(q_i) = \int d(\boldsymbol{f}^{'}_D + \langle \boldsymbol{f}_D \rangle) p_0(\boldsymbol{f^{'}}_D + \langle \boldsymbol{f}_D \rangle)\partial_iP(D|f^{'}_1+\langle f_1 \rangle_0,...,f^{'}_i+\langle f_i \rangle_0,...,f^{'}_i+\langle f_N \rangle_0) $$
I assume that the value $ \langle \boldsymbol{f}_D \rangle_0 $ is constant which is why the derivation becomes such. If so, why then $ \langle f_1 \rangle $ and $ \langle f_N \rangle_0 $ values are cancelled while $ \langle f_i \rangle_0 $ survived.
Thanks in advance.