Rigorous Derivation of Metropolis-Hastings Transition Density

330 Views Asked by At

The Metropolis-Hastings MCMC algorithm is as follows. Set $X_0$ to some initial value in the support of the target density $f$ and choose a proposal density $q(y \mid x)$; a density in $y$ for each $x$ in the support of $f$. For each time $t \in \mathbb{N}$, do the following:

  1. Generate a candidate point $Y_t \sim q(\cdot \mid X_t)$
  2. Set $$ X_{t+1} = \begin{cases} Y_t, \text{ with probability } \rho(X_t, Y_t) \\ X_t, \text{ with probability } 1 - \rho(X_t, Y_t), \end{cases} $$ where $$ \rho(x,y) = \min\left\{\frac{f(y)q(x \mid y)}{f(x)q(y \mid x)}, 1\right\}. $$

These two steps define a Markov chain $\{X_t\}$ on an uncountable state space $\mathcal{S}$. Our professor wrote the transition kernel $\pi$ for this Markov chain as $$ \pi(x,y) = q(y \mid x)\rho(x,y) + (1 - r(x))\delta_x(y), $$ where $r(x) = \int_{\mathcal{S}} q(y \mid x )\rho(x,y)\, dy$. I was unable to verify for myself this was indeed the transition density. Is there a rigorous derivation/proof of this available?

I know I need to show the following. Let $f_t$ be marginal density of the chain at time $t$. Denote the conditional distribution $P( \cdot \mid X_t = x) : \mathcal{B}(\mathcal{S}) \to [0,1]$; a probability measure on the state space for each $x \in \mathcal{S}$. For any sets $A,B \in \mathcal{B}(\mathcal{S})$ the conditional distribution is defined so as to satisfy $$ P(X_{t+1} \in A, X_t \in B) = \int_B f_t(x)P(X_{t+1} \in A \mid X_t = x) \, dx. $$ Thus I need to show $$ P(X_{t+1} \in A, X_t \in B) = \int_B f_t(x) \left(\int_A \pi(x,y) \,dy \right)\, dx. $$ For this I am at sea.

1

There are 1 best solutions below

1
On

To simplify things consider that your state space is $\mathbb{R}^n$. If you are at $x\in\mathbb{R}^n$ there are two ways of reaching a point $x'$ in the set $A\subseteq \mathbb{R}^n$:

  • You can propose a candidate $x'\in A$ by sampling from the proposal distribution $x'\sim q(x, x')$ and then you can accept it with probability $\alpha(x'\mid x)$.
  • Otherwise, you can reject whatever proposal but you are so lucky that the current state $x$ is actually already in $A$.

This means your kernel is given by

$$ K(x, A) = \underbrace{\int_Aq(x, x') \alpha(x' \mid x) dx'}_{\text{Probability of moving to a state in $A$}} + \underbrace{\left(1 - \int_{\mathbb{R}^d} q(x, x')\alpha(x' \mid x) dx'\right) \mathbb{I}_{A}(x)}_{\text{Probability of staying at $x$ when $x\in A$}} $$

Let's inspect this:

  • In the first term we are summing over all candidates $x'\in A$ and the integrand is the probability that we move to $x'$ given that we were in $x$ and we proposed $x' \sim q(x, x')$.
  • In the second term we first compute the probability of leaving $x$ as $$ a(x) = \int_{\mathbb{R}^d} q(x, x') \alpha(x'\mid x) dx' $$ and then by taking $1-a(x)$ we compute the probability of rejection. We then multiply this by an indicator function that is $1$ only when $x\in A$. $$ \mathbb{I}_A(x) = \begin{cases} 1 & x \in A \\ 0 & x \notin A \end{cases} $$ These are the only two ways to move from a state $x\in\mathbb{R}^n$ to any state in $A\subseteq \mathbb{R}^n$.

Does it make sense?