How to calculate the transition matrix in Markov sampling (example)?

2.9k Views Asked by At

Let's say you can simulate a discrete uniform distribution $\{0,1\}$ (like a coin toss). With $P\{1\} = P\{2\} = 0.5$.

Now we would like to simulate a distribution $S = \{1,2,3\}$ with $P_Z\{1\} = 0.2$ and $P_Z\{2\} = P_Z\{3\} = 0.4$.

How can you calculate the transition matrix $M$ which has $P_Z$ as its stationary distribution? This example is taken from the Wikipedia Article "MCMC-Verfahren" (german Wikipedia) https://de.wikipedia.org/wiki/MCMC-Verfahren.

I dont understand how the transition matrix $M$ is formed. Which rules are applied to calculate the transition probabilites $M_{ij}$?

The solution is: $M = \begin{bmatrix}0&1&0\\0&0.5&0.5 \\ 0.5&0&0.5\end{bmatrix}$

Any help is appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

There are many answers on this site dealing with Markov chain Monte Carlo, and a rigorous introduction can be found in many textbooks, but I guess that you are looking for some background context.

There is, in general, considerable choice of transition matrices to generate a desired distribution. So there is no unique prescription, but the matrix $M$ must satisfy some conditions.

  1. It must be a stochastic matrix which means each of its rows must add up to $1$: $\sum_j M_{ij}=1, \; \forall i$. Assuming the initial distribution $P_0\{i\}$ is normalised, $\sum_i P_0\{i\}=1$, this ensures that the normalization is preserved by the transition matrix: $\sum_j P_1\{j\}=1$ where $P_1\{j\}=\sum_i P_0\{i\} M_{ij}$ (and so on, for all subsequent iterations). You can see that your example matrix $M$ satisfies this condition.
  2. It must satisfy $$\sum_j P_Z\{j\} M_{ji}=\sum_j P_Z\{i\} M_{ij}=P_Z\{i\}, \; \forall i.$$ This guarantees that $P_Z\{i\}$ is a steady state solution of the Markov chain. Physically, the first equality represents a balance of probability flows into, and out of, each state $i$, once the desired distribution $P_Z$ has been achieved. The second equality follows from condition 1. It establishes $P_Z$ as a left eigenvector of the matrix $M$, with eigenvalue $1$. Again, if you do the calculations, you can see that your example matrix satisfies this condition.
  3. It is not necessary, but is sufficient, and is quite frequently applied, that the matrix should satisfy the detailed balance condition \begin{align*} P_Z\{j\} M_{ji}&=P_Z\{i\} M_{ij}, \; \forall i,j. \\ \Rightarrow\quad \frac{M_{ji}}{M_{ij}} &= \frac{P_Z\{i\}}{P_Z\{j\} } , \; \forall i,j. \end{align*} This guarantees that each term in the sum in condition 2 satisfies the equality: at steady state the flow from $i\rightarrow j$ is exactly balanced by the flow from $j\rightarrow i$, for every $i,j$ pair. Interestingly, your example matrix $M$ does not satisfy the detailed balance condition. Nonetheless, it generates the desired distribution. A way of constructing the matrix to satisfy detailed balance is described in the answer to this question: Designing a Markov chain given its steady state probabilities. If we apply the method to your distribution we get $$ M' = \begin{bmatrix} 0.6 & 0.4 & 0 \\ 0.2 & 0.4 & 0.4 \\ 0 & 0.4 & 0.6 \end{bmatrix} $$ However, this is not the only way of doing it: there is still plenty of flexibility in choosing the elements of $M$ to satisfy detailed balance.
  4. All the above relates to the steady state distribution. It is almost always desirable to ensure that an arbitrary initial distribution actually tends towards this steady state, and that imposes further conditions on $M$: it must be irreducible and aperiodic. For an irreducible transition matrix, every state is accessible from every other state, given sufficient iterations of $M$. In other words, it should not be possible to subdivide the states into two or more disconnected sets. The aperiodic condition avoids the possibility that the limiting state is an oscillation between two or more different distributions. There is some discussion of this here:What values makes this Markov chain aperiodic?. A periodic transition matrix will have more than one eigenvalue whose modulus is $1$, representing the non-stationary behaviour: this is to be avoided.

I should mention the Metropolis-Hastings algorithm for constructing $M$, which is very useful when the state space is very large, and when the distribution probabilities are only known up to some common normalizing factor (which may be impractical to calculate). This is a common situation in physics simulations. Then the off-diagonal elements $M_{ij}$ are expressed as a product of two terms: a proposal probability $\alpha_{ij}$, and an acceptance probability $A_{ij}$. The elements of $\alpha$ are usually chosen to be nonzero for pairs of states which are "nearby" (in a sense connected with the physics); they are independent of $P_Z$, and typically $\alpha_{ij}=\alpha_{ji}$. The acceptance probability $A_{ij}$ is given by a formula involving the ratio $P_Z\{i\}/P_Z\{j\}$, and so that $A_{ij}/A_{ji}$ satisfies detailed balance. The diagonal elements of $M$ are fixed by condition 1 above. More details are easy to come by, elsewhere.

Lastly, there is some scope for discussing whether one transition matrix is "better" than another, for example in the sense of converging more quickly towards the stationary distribution. Again, I don't think it's appropriate here to go into details, but this will be related to the subsidiary eigenvalues of $M$, particularly the second largest (the largest being $1$).