How to get the expectation of a parameter θ from a joint distribution p(θ,D) in the context of Bayesian learning?

63 Views Asked by At

Page 74 in pattern recognition and machine learning (free) says

... we see that as the number of observations increases, so the posterior distribution becomes more sharply peaked ... we can take a frequentist view of Bayesian learning and show that, on average, such a property does indeed hold. Consider a general Bayesian inference problem for a parameter θ for which we have observed a data set D, described by the joint distribution p(θ,D). The following result says that the posterior mean of θ, averaged over the distribution generating the data, is equal to the prior mean of θ.

enter image description here

Where does equation 2.21 come from? Why is left side equal to right side?

1

There are 1 best solutions below

2
On

This is basically a useful formula in calculating expectations.

$$\mathbb{E}_\theta[\theta]=\int_{\Theta}{\theta f_\Theta(\theta)d\theta}=\int_{\Theta}{\theta\left[\int_{\mathcal{D}}{f_{\Theta,\mathcal{D}}(\theta,x)dx}\right]d\theta}$$

$$=\int_{\mathcal{D}}{\left[\int_\Theta{\theta f_{\Theta,\mathcal{D}}(\theta,x)d\theta}\right]dx}=\int_{\mathcal{D}}{\mathbb{E}_\theta[\theta\mid\mathcal{D}]dx}=\mathbb{E}_\mathcal{D}[\mathbb{E}_\theta[\theta\mid\mathcal{D}]].$$

The integrals may not be so strict, but I think the ideas are the same.