Simple example of numerical integration using the Metropolis-Hastings algorithm

324 Views Asked by At

Let

  • $(E,\mathcal E)$ be a measurable space
  • $\kappa$ be a Markov kernel with source and target $(E,\mathcal E)$
  • $\mu$ be a probability measure on $(E,\mathcal E)$

I know how the Metropolis-Hastings algorithm works in the case where $E$ is countable and $\mathcal E$ is the discrete topology on $E$.

Now, assume we want to use the Metropolis-Hastings algortihm to numerically integrate the function $$f(x,y):=\begin{cases}1&\text{, if }x^2+y^2\le1\\0&\text{, otherwise}\end{cases}$$ over $E=[-1,1]^2$ with respect to the two-dimensional Lebesgue measure $\lambda^2$. Clearly, $$\int_Ef\:{\rm d}\lambda^2=\pi.$$ As in the case of a countable $E$, the idea should be to construct a "proposal" (time-homogeneous) Markov chain $(X_n)_{n\in\mathbb N_0}$ with transition kernel $\kappa$. Then the algorithm produces a Markov chain with stationary distribution $\mu$.

But what's an example of a choice for $\kappa$ and $\mu$ here?

I assume that $\kappa$ can't be arbitrary. I guess we need to impose some "aperiodicity", "irreducibility" and/or "reversibility" assumptions on the objects.

I've searched for quite some long time, but couldn't find a single simple numerical integration example like this.

1

There are 1 best solutions below

0
On BEST ANSWER

Let $\mathbf{x} =(x,y)$ because it might be easier to write and look at.

You don't choose $\kappa$, you choose a transition density $q$. The MH transition kernel is built on top of this $q$; also, it isn't time homogeneous. You also don't choose $\mu$, really; it's just the probability distribution of interest. For instance, in Bayesian statistics, it may be your model's posterior distribution.

How this works is that, at each iteration, you need to be able to sample a proposal $\mathbf{y}$ given your current position in the chain $\mathbf{x}$ using your transition density $q(\mathbf{x}, \mathbf{y})$. You also need to be able to evaluate your transition density $q$, and evaluate at least an unnormalized target distribution proportional to $\mu$. This is what the acceptance probability is based on. So, at each iteration you draw from $q(\mathbf{x},\mathbf{y})$, calculate $a(\mathbf{x},\mathbf{y})$, and either "accept" (set $\mathbf{x}' = \mathbf{y}$) the sample or "reject" the sample (set $\mathbf{x}' = \mathbf{x}$).

In your example,

  1. the target probability distribution is $\mu(A) = \pi^{-1} \iint_A f(\mathbf{x}) d\mathbf{x}$, and
  2. $\kappa(\mathbf{x},A) = \iint_{A}q(\mathbf{x},\mathbf{x}')a(\mathbf{x},\mathbf{x}')d\lambda^2 + \mathbb{1}_{A}(\mathbf{x}) r(\mathbf{x})$ where $q$ is some user-chosen transition density, $$ a(\mathbf{x},\mathbf{x}') = \min\left(1, \frac{\mu(\mathbf{x}')q(\mathbf{x}', \mathbf{x})}{\mu(\mathbf{x}) q(\mathbf{x}, \mathbf{x}')} \right) $$ is the acceptance probaility, and $$ r(\mathbf{x}) = \iint_{\mathbf{s} \in E} q(\mathbf{x}, \mathbf{s}) \left[1 - a(\mathbf{x},\mathbf{s})\right] d\lambda^2 $$ is the probability of not moving from $\mathbf{x}$.

To give you some more examples of how $q$ is chosen, sometimes it is chosen to be $$ q(\mathbf{x},\mathbf{x}') = \text{Normal}(\mathbf{x}' ; \mathbf{x}, \Sigma). $$ This is the Random Walk Metropolis-Hastings algorithm; you have to pick $\Sigma$ carefully.

It's also common to pick $q(\mathbf{x},\mathbf{x}') = q(\mathbf{x}')$; or in other words, draw independently from where you're currently at. This is called the independent Metropolis-Hastings sampler. For your problem, here, you might consider setting $q(\mathbf{x}')$ to be a Uniform distribution on the square that covers your circle.

Last, you don't need to worry about irreducibility and aperiodicity because they almost always hold if you're using well-known strategies. You can show that this chain is $\mu$-irreducible if $\mu(\mathbf{y}) > 0 \implies q(\mathbf{x}, \mathbf{y}) > 0$ for all $x$, and as there is some chance of staying point at every iteration, it's aperiodic. MH chains are always reversible, too.