Metropolis Hastings algorithm for sampling, calculating the acceptance function

165 Views Asked by At

Formula for calculating the acceptance function What I find confusing is that to get the acceptance function, you have use the stationary distribution. If you have the stationary distribution, then why not simply sample from the distribution directly as opposed to metropolis-hasting algo?? you have the distribution parameters of the stationary distribution from which you want to sample, then we can use roulette wheel sampling or some random number generation and sampling approach ??

1

There are 1 best solutions below

1
On BEST ANSWER

To expand on my tongue in cheek comment, even if you can write down the distribution of your random variable it is not immediately obvious how one might go about simulating a set of observations $Y_1,Y_2,\ldots,$ from this density. A couple of cases when this might be possible

  • Your random variable is univariate and you can invert the CDF
  • The density function of the random variable is amenable to accept/reject, importance sampling, slice sampling or similar methods which work with the density function
  • $Y$ is itself a (possibly complex, and hierarchical) transformation of some known random variable which is easy to simulate

But these cases do not exhaust the class of random variables we might be interested in, and certainly once the random variable increases in dimension it is increasingly unlikely that you will be able to sample the random variable easy. This situation occurs frequently in Bayesian inference when we are interested in the posterior distribution of some parameter $\theta$ in which case we usually end up requiring the evaluation of densities of the form $$ \begin{align} p(\theta | \mathbf{X} ) \propto p(\mathbf{X}|\theta)p_0(\theta), \tag{1} \end{align} $$ having observed a sample $\mathbf{X}$. Now at this point we can point out a big advantage of the Metropolis-Hastings method, in particular using the notation in your question you have $$ A(\theta \rightarrow \theta') = \frac{ \pi(\theta')Q(\theta' \rightarrow \theta) }{ \pi(\theta)Q(\theta \rightarrow \theta') } $$ where $\pi(\theta) = p(\theta | \mathbf{X})$. Now as it stands we have cheated in writing the density as $(1)$ because we still need to carry out an integration to find the normalising constant $Z = \int_{\Theta} p(\mathbf{X}|\theta)p_0(\theta)d\theta$ in order to normalise the density. Doing this may be incredibly hard, because of the dimension or just a particularly intractable functional form, but because $(2)$ is a ratio we don't need to worry since $$ \begin{align*} A(\theta \rightarrow \theta') &=\frac{p(\theta'|\mathbf{X})}{p(\theta|\mathbf{X} )}\frac{ Q(\theta' \rightarrow \theta) }{ Q(\theta \rightarrow \theta') } \\ &= \frac{p(\mathbf{X}|\theta')p_0(\theta')/Z }{p(\mathbf{X}|\theta)p_0(\theta) / Z}\frac{ Q(\theta' \rightarrow \theta) }{ Q(\theta \rightarrow \theta') } \\ &= \frac{p(\mathbf{X}|\theta')p_0(\theta') }{p(\mathbf{X}|\theta)p_0(\theta) }\frac{ Q(\theta' \rightarrow \theta) }{ Q(\theta \rightarrow \theta') } \end{align*} $$ and so we don't need to worry about the normalising constant, this is a very good reason for considering these types of methods.

As another example consider the following density $$ p(x_1, x_2, x_3) \propto \exp\left\{-(x_1 + x_2 + x_3 + \alpha x_1 x_2 + \beta x_2 x_3 + \gamma x_3 x_1 \right\}, \qquad x_i \geq 0, $$ how would you sample from it? You could try writing it as $p(x_1,x_2,x_3) = p(x_3, x_2 | x_1)p(x_1)$ but for example you can show that the marginal $p(x_1)$ is of the form $$ p(x_1) \propto e^{-x_1}\int_0^{\infty}\frac{e^{-x_2 - \alpha x_1 x_2 }}{1 + \beta x_2 + \gamma x_1 } dx_2 $$ which is already complicated and we still need to work out $p(x_3,x_2|x_1)$. However consider using Gibbs sampling methods, which is a special case of Metropolis-Hastings which requires being able to simulate from the conditionals $p(x_1 | x_2, x_3)$ etc, but $$ p(x_1 | x_2, x_3 ) \propto \exp\left\{-(1 + \alpha x_2 + \gamma x_3 )x_1\right\} $$ which is just an exponential random variable which is very easy to sample from because it satisfies the invertible CDF property I mentioned earlier. So Metropolis-Hastings (Gibbs) sampling makes this problem much, much easier than trying to simulate $x_1$ and then simulate $x_2,x_3 | x_1$ or some similar strategy.

Hopefully that adds some motivation for why sampling methods might be required, the price we pay for their flexibility is that we construct chains that only asymptotically sample from the desired distribution and therefore we must address the question of convergence of these chains, and the correlation between samples and similar issues.