Show prediction with Gaussian Process is not affected by the change of distribution of the covariates

23 Views Asked by At

The question is based on material DATASET SHIFT IN MACHINE LEARNING by QUIÑONERO-CANDELA.

I'm confused about the following content relates to seeing that Gaussian process is a conditional model where the distribution of the covariates has no effect on the predictions. The authors mentioned that - This follows from the fact that no matter what other covariate samples we see, the prediction for our current data remains the same; that is, Gaussian processes satisfy Kolmogorov consistency: $$ \begin{aligned} P\left(\left\{y_i\right\} \mid\left\{\mathbf{x}_i\right\},\left\{\mathbf{x}^k, y^k\right\}\right) & =\int d y^* P\left(\left\{y_i\right\}, y^* \mid\left\{\mathbf{x}_i\right\}, \mathbf{x}^*,\left\{\mathbf{x}^k, y^k\right\}\right) \\ & =P\left(\left\{y_i\right\} \mid\left\{\mathbf{x}_i\right\}, \mathbf{x}^*,\left\{\mathbf{x}^k, y^k\right\}\right) \end{aligned} $$ where (1.2) results from the definition of a Gaussian process, and (1.3) from basic probability theory (marginalization). In this equation the $y_i$ are the test targets, $\mathbf{x}_i$ the test covariates, $\mathbf{x}^k$ and $y^k$ the training data, and $\mathbf{x}^*, y^*$ a potential extra training point. However, we never know the target $y^*$ and so it is marginalized over. The result is that introducing the new covariate point $\mathbf{x}^*$ has had no predictive effect.

From Wiki and other onlien souce, the major definition of Gaussian process is that any finite number of random variable in the process is multivariate Gaussian. But I don't see how that explain the first equality.