Metropolis-Hastings not clearly understood

164 Views Asked by At

I am trying to understand how the Metropolis-Hastings algorithm works and, if possible, to build a small example myself (to be sure that I have understood correctly).

Unfortunately, there are still a number of points that bother me in constructing this algorithm.…

For example, in the various explanations that I encounter here or there, it is explained that we want to construct a chain $(X_t)_{t\ge 0}$ with $n$ states whose equilibrium vector is given by a certain vector $(\pi_1, …, \pi_n)$ given in advance (so-called “target distribution”).

Okay. But it’s not indicated if $\pi_i >0, \ \forall i$ or if some elements can be zero... (although I don't see why it could be possible...). Are we actually imposing $\pi_i >0, \ \forall i$? -> [question 1]

We then give ourselves a proposition matrix (generally named $Q$, so-called “transition kernel”), for which it is generally required that it be irreducible, aperiodic and that $q_{i,j}>0 \Longrightarrow q_{j,i}>0 $. But these conditions are not always indicated…. Why are the characteristics of $Q$ not always fully explained? -> [question 2]

Then, we suppose that the chain $(X_t)_{t\ge 0}$ that we want to build is (at a certain time $t$) in the state $X_t =i$, and we propose a next state $X_{t+1}=j$ by first calculating the so-called “acceptance probabilities”:

$$a_{i,j}=\min\left(1, \frac{\pi_j q_{j,i}} {\pi_i q_{i,j}}\right), \forall j \ 1 <= j <= n $$

Then, we say that we accept $j$ with the probability $a_{i,j}$ and that, otherwise, we stay on $X_{t+1}=X_t=i$.

OK, but this raises some other questions:

When we calculate the $ a_{i,j}$, why can't we have $ q_{i,j}=0$? -> [question 3]

If several $ a_{i,j}$ are equal to $1$, how do we make a choice? -> [question 4]

It seems that we must accept $j=1$ with probability $a_{i,1}$, but also $j=2$ with probability $a_{i,2}$, etc. How is it possible ? -> [question 5]

That's a lot of questions about a Metropolis-Hastings approach, but I couldn't find any clear document. Do you know where I could find the answers to these questions to help me progress in understanding the algorithm?

1

There are 1 best solutions below

12
On
  1. You don't have to have $\pi_i>0$ for all $i$, technically. But there is no good reason to spend resources on bookkeeping for a state that you want your chain to never visit. Also, if you define $a_{i,j}$ in your program by an actual division as you described in #3, then having any $\pi_i=0$ will cause a technical problem due to division by zero. If you insist on including such a state in your setup, then this technical problem can be circumvented. Post a comment if you would like me to expand on this technicality.
  2. Irreducibility of the proposal chain is necessary, that should really be asserted or at least be extremely obvious from how it is assembled. Aperiodicity of the proposal chain is not necessary, as aperiodicity of the full M-H chain will happen as long as proposal rejections ever occur, i.e. as long as the proposal chain is not already reversible wrt $\pi$.
  3. $q_{i,j}=0$ will never come up because given the previous state $i$, you produce $j$ by drawing a sample from the distribution of $q_{i,j}$ to begin with. (There is a slight mathematical subtlety about the distinction between "probability zero" and "impossible" going on here. Post a comment if you would like me to expand on this.)
  4. You accept the move to $j$, which at this point has already been proposed, with probability $a_{i,j}$. There is no issue with having multiple $j$ such that $a_{i,j}=1$ (this just means that there are several transitions from $i$ that you are guaranteed to accept).
  5. Again the $a_{i,j}$ are just the acceptance probabilities, which only come up after the proposal to move to $j$ has already been made. The overall transition probability in the full M-H chain is $a_{i,j} q_{i,j}$. Analytically speaking, it is easier to think about M-H that way, i.e. as "a Markov chain with transition probabilities given by $a_{i,j} q_{i,j}$ for $j \neq i$, where $\pi_i$ and $q_{i,j}$ are given and $a_{i,j}$ is computed using this formula". (This observation is actually important in that it verifies that the full M-H chain is actually Markov, which is not entirely obvious from the way it is normally described.)