Page 74 in pattern recognition and machine learning (free) says
... we see that as the number of observations increases, so the posterior distribution becomes more sharply peaked ... we can take a frequentist view of Bayesian learning and show that, on average, such a property does indeed hold. Consider a general Bayesian inference problem for a parameter θ for which we have observed a data set D, described by the joint distribution p(θ,D). The following result says that the posterior mean of θ, averaged over the distribution generating the data, is equal to the prior mean of θ.
Where does equation 2.21 come from? Why is left side equal to right side?

This is basically a useful formula in calculating expectations.
$$\mathbb{E}_\theta[\theta]=\int_{\Theta}{\theta f_\Theta(\theta)d\theta}=\int_{\Theta}{\theta\left[\int_{\mathcal{D}}{f_{\Theta,\mathcal{D}}(\theta,x)dx}\right]d\theta}$$
$$=\int_{\mathcal{D}}{\left[\int_\Theta{\theta f_{\Theta,\mathcal{D}}(\theta,x)d\theta}\right]dx}=\int_{\mathcal{D}}{\mathbb{E}_\theta[\theta\mid\mathcal{D}]dx}=\mathbb{E}_\mathcal{D}[\mathbb{E}_\theta[\theta\mid\mathcal{D}]].$$
The integrals may not be so strict, but I think the ideas are the same.