Perplexities about Bayesian inference and model averaging (BMA)

22 Views Asked by At

reading about Bayesian approach on model selection, I was just wondering about the more mathematical meaning of Bayesian model averaging. Say for example that we are given a dataset $\mathcal{D} = \{\textbf{x}_i, y_i\}_{i=1}^n$ where $\textbf{x} \in \mathbb{R}^D$, $y \in \mathbb{R}$ and we ant to perform a regression task. With a Bayesian approach, after putting a prior on $\theta \sim \mathcal{N}(0, v^2I)$ we want to estimate the posterior distribution:

$$p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta)p(\theta)}{p(\mathcal{D})}$$

and finally the predictive posterior:

$$p(y|x,\mathcal{D}) = \int_{\theta \in \Theta}p(y|x, \mathcal{D},\theta)p(\theta|\mathcal{D})d\theta$$.

In particular the above quantity can also be considered as $\mathbb{E_{p(\theta|\mathcal{D})}[p(y|x,\mathcal{D},\theta)]}$.

So we're basically computing the expected value of the distribution $p(y|x,\mathcal{D},\theta)$ weighted by the posterior distribution (What does this mean, in practice?).

My issue is that I can't manage to actually visualize what is happening here.

Also, how can we get to the expression for the predictive distribution in a rigorous mathematical manner?

Another issue that I have is, say that we are not in the trivial case where both $p(\theta)\, , \, p(y|x,\mathcal{D}, \theta)$ are gaussians, so that the posterior doesn't have a closed form, but we can only approximate it (using for example Gibbs Sampling).

How does the above expression translates in this case?

I guess that in this case we only have access to, say $m$ samples from the posterior, so that

$$\mathbb{E}_{p(\theta|\mathcal{D})}[p(y|x,\mathcal{D},\theta)] \approx \frac{1}{m}\sum_{i=1}^m p(y|x, \mathcal{D},\theta^{(i)})p(\theta^{(i)}|\mathcal{D})$$.

Is this writing correct?

Many thanks,

James