Do individual Metropolis-Hastings map preserve the target measure?

84 Views Asked by At

Consider the probability space $(Q,\mathcal{B}(Q),\pi)$, where $Q \subseteq \mathbb{R}$ is a sample space, $\mathcal{B}(Q)$ is the Borel $\sigma$-algebra on $Q$, and $\pi$ is some probability measure to be sampled.

Then consider the Metropolis-Hastings map on the sample space $Q$:

$$t(q) = q + \varepsilon \cdot\mathbb{1}\left[\eta < \frac{f(q+\varepsilon)}{f(q)}\right].$$

Here, $f(q)$ is the density of $\pi$ on $Q$ and $\mathbb{1}$ is the indicator function. The parameter $\varepsilon$ follows the normal distribution $\varepsilon \sim N(0,1)$ and the parameter $\eta$ follows the uniform distribution $\eta \sim U(0,1)$. This defines the Metropolis-Hastings random walk.

Here is what I want to prove (or disprove). For a given value of $\varepsilon$ and $\eta$, prove that the map preserves the target measure:

$$t_*\pi(A) \equiv \pi[t^{-1}(A)] = \pi(A) \Rightarrow t_*\pi = \pi$$

Here, $t_*\pi$ is the image measure.

It obvious that the target measure will be preserved if I average over $\varepsilon$ and $\eta$. But I want to know if this can be said of each individual maps in the above sense. I was induced to believe (pp. 2260-2262) that this should be the case. Is it true? And if so, how do I prove it?

2

There are 2 best solutions below

0
On BEST ANSWER

I think that the answer is no. This is not a measure-preserving map.

First, this is not a bijective map. Given a final value $q' = t(q)$ and the values $\varepsilon$ and $\eta$, the initial value $q$ cannot be uniquely inferred in general. This is because the value $q' = t(q)$ can be obtained from an accepted proposal or a rejected proposal, and it is in general not possible to tell from which.

Here is an explicit example. Consider the Gaussian density:

$$f(q) = \frac{1}{\sqrt{2\pi}}\exp(-q^2/2).$$

Choose $\varepsilon > 0$ and

$$\eta = \exp(-\varepsilon^2/2).$$

Then suppose that the final point is located at some value:

$$q' = t(q) = x \varepsilon,$$

for some $0 < x < 1$. With these values, you can convince yourself that the two initial points

$$q = x\varepsilon \;\;\textrm{and}\;\; q = (x-1)\varepsilon,$$

give the same value $q' = x \varepsilon$. The first is for rejection and the second is for acceptance.

Therefore, the Borel sets cannot be given a unique preimage and the image measure cannot be uniquely defined.

Nor are the measures of the possible preimages identical to the measure of the original set. Consider an interval $A = [x \varepsilon,(x + dx)\varepsilon]$. In the limit of small $dx$, its measure under $\pi$ is approximately:

$$\pi(A) \approx \frac{1}{\sqrt{2\pi}}\exp(-x^2 \varepsilon^2/2) \varepsilon\, dx$$

Two possible (but not unique) preimage sets $B = t^{-1}(A)$ are the two intervals:

$$B_{\textrm{rej}} = [x\varepsilon,(x+dx)\varepsilon] \;\;\textrm{and}\;\; B_{\textrm{acc}} = [(x-1)\varepsilon,(x + dx - 1)\varepsilon].$$

In the limit of small $dx$, their measure under $\pi$ is

$$\pi(B_{\textrm{rej}}) \approx \frac{1}{\sqrt{2\pi}}\exp(-x^2 \varepsilon^2/2) \varepsilon\, dx \;\;\textrm{and}\;\; \pi(B_{\textrm{acc}}) \approx \frac{1}{\sqrt{2\pi}}\exp[-(x-1)^2 \varepsilon^2/2] \varepsilon\, dx$$

Note that while $\pi(B_{\textrm{rej}}) = \pi(A)$, we have $\pi(B_{\textrm{acc}}) \neq \pi(A)$ for $x \neq 1/2$. We have constructed a preimage set of $A$ with a different measure. Therefore, the map $t(q)$ is not a measure preserving map, at least not always.

3
On

Happy to be corrected here but the maps are just translations: either you don't move or you move by $\epsilon$ so are definitely measure preserving. The densities of $\eta$ and $\epsilon$ are given so that you integrate over the (parameterised) paths, so something like

$$ \int \epsilon \left[ \int \mathbb{I} \left( \eta < \frac{f(q + \epsilon)}{f(q)}\right) \mathrm{d}\eta\right] \mathrm{d}\epsilon = \int \epsilon \left[ 1 \wedge \frac{f(q + \epsilon)}{f(q)} \right] \mathrm{d}\epsilon $$

which we can write as

$$ \int \alpha(z, z')p(z,z')\mathrm{d}z' $$

where

$$ \alpha(z, z') = 1 \wedge r(z, z') \quad \textrm{and} \quad r(z, z') = \frac{f(z')p(z',z)}{f(z)p(z, z')} $$

And this is the familiar Metropolis Hastings kernel with a Gaussian random walk.

Apologies for missing out some steps and not considering the case where the chain stays where it is.