When applying gaussian process machine learning to regression problems where we want to determine the value a function $f$ takes at a new input point $x_{n+1}$, given observations of function values at $\{x_1,...,x_n$}; we can apply continuous bayes theorem to conclude that the posterior distribution can be obtained by marginalising over some hyperparameters $\theta$ as follows:
$$ P(f(x_{n+1})|D) = \int P(f(x_{n+1})|~\theta,D)~P(\theta|D)~d\theta $$ where $D$ is our observations. The above integration can be performed using monte-carlo methods and we obtain the estimate:
$$ P(f(x_{n+1})|D) \approx \frac{1}{R}\sum_i^R P(f(x_{n+1})|~\theta_i,D) $$
My question is about how we interpret this last term. Given that we are performing Gaussian process regression we should have a gaussian as our output, but how do we determine what the gaussian pdf is of this sum? Since a gaussian is characterised by it's mean and variance we surely need to be approximating these.
Each of the $P(f(x_{n+1})|~\theta_i,D)$ are multivariate gaussians, but adding gaussian pdf doesn't give another gaussian pdf. Should we instead be viewing this as taking an average of gaussian random variables? In which case we would add the means etc...
Should we instead be viewing this as taking an average of gaussian random variables?
Yes. See Doucet & Johansen 2008 - A tutorial on Particle Filtering and Smoothing section 3.1; they make that explicit in their discussion of MC methods and use notation that stops this confusion from arising in the first place.