reading about Bayesian approach on model selection, I was just wondering about the more mathematical meaning of Bayesian model averaging. Say for example that we are given a dataset $\mathcal{D} = \{\textbf{x}_i, y_i\}_{i=1}^n$ where $\textbf{x} \in \mathbb{R}^D$, $y \in \mathbb{R}$ and we ant to perform a regression task. With a Bayesian approach, after putting a prior on $\theta \sim \mathcal{N}(0, v^2I)$ we want to estimate the posterior distribution:
$$p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta)p(\theta)}{p(\mathcal{D})}$$
and finally the predictive posterior:
$$p(y|x,\mathcal{D}) = \int_{\theta \in \Theta}p(y|x, \mathcal{D},\theta)p(\theta|\mathcal{D})d\theta$$.
In particular the above quantity can also be considered as $\mathbb{E_{p(\theta|\mathcal{D})}[p(y|x,\mathcal{D},\theta)]}$.
So we're basically computing the expected value of the distribution $p(y|x,\mathcal{D},\theta)$ weighted by the posterior distribution (What does this mean, in practice?).
My issue is that I can't manage to actually visualize what is happening here.
Also, how can we get to the expression for the predictive distribution in a rigorous mathematical manner?
Another issue that I have is, say that we are not in the trivial case where both $p(\theta)\, , \, p(y|x,\mathcal{D}, \theta)$ are gaussians, so that the posterior doesn't have a closed form, but we can only approximate it (using for example Gibbs Sampling).
How does the above expression translates in this case?
I guess that in this case we only have access to, say $m$ samples from the posterior, so that
$$\mathbb{E}_{p(\theta|\mathcal{D})}[p(y|x,\mathcal{D},\theta)] \approx \frac{1}{m}\sum_{i=1}^m p(y|x, \mathcal{D},\theta^{(i)})p(\theta^{(i)}|\mathcal{D})$$.
Is this writing correct?
Many thanks,
James