Problems in understanding of markov chains in context of coupling from the past

190 Views Asked by At

I have read about the coupling from the past algorithm that is used for perfect sampling from the stationary distribution of a discrete markov chain. My question is not exactly about this algorithm, but why I can't apply its proof idea as well to "forward coupling", i.e. what is wrong with the following idea.

Let $\Omega$ be the set of states. Take a distribution on the space of "time step functions" $\mathcal{F} = \{f: \Omega \to \Omega\}$ that guarantees $\mathbb{P}[f(x) = y] = P(x, y)$ for all $x, y\in \Omega$ and some transition probabilities $P(x,y)$ which are aperiodic and irreducible. For each $t\in\mathbb{Z}$ choose $f_t$ at random, independently according to this distribution, and for $t_1 < t_2$ set $f_{t_1}^{t_2} := f_{t_2 -1}\circ ... \circ f_{t_1}$.

Now I want to do a "forward coupling". Let $M := \min\{t > 0: |f_0^t(\Omega)|=1\}$, i.e. the number of time steps so that starting at time $t=0$ the paths for all start values coalesce for the first time. Of course, since the $f's$ are chosen at random, $M$ is a random variable, and we assume that our markov chain guarantees that $\mathbb{P}[M < \infty] = 1$. Now $f_0^M$ is a random constant function, so let $\tilde{y}\in \Omega$ be its unique image, i.e. $f_0^M(\Omega)=\{\tilde{y}\}$.

Say $\pi$ is the stationary distribustion on $\Omega$ with respect to $P$. It is well known that $\tilde{y}$ has not distribution $\pi$ and this is why we need to "couple into the past" instead (which brings its own difficulties), and I have seen simple counterexamples, but I don't understand why, because for all $y\in\Omega, k>0$ and with an arbitrary $x\in\Omega$:

$$\mathbb{P}[\tilde{y}=y]=\mathbb{P}[f_0^M(\Omega)=\{\tilde{y}\}]=\mathbb{P}[f_0^M(x)=\tilde{y}] = \mathbb{P}[f_0^M\circ f_{-k}^0(x)=\tilde{y}] = \mathbb{P}[f_{-k}^M(x)=\tilde{y}] = P^{M+k}(x,\tilde{y})\longrightarrow \pi(\tilde{y})$$ when $k\longrightarrow\infty$. The third equation comes from the fact that $f_0^M$ is a constant function.

Please find the mistake. Thanks in advance.

2

There are 2 best solutions below

0
On BEST ANSWER

I think I have found my own mistake.

The problematic point is the probability $P^{M+k}(x, \tilde{y})$. As I mentioned, $M$ is a random variable that depends on the randomly chosen $f's$, and this makes it illegal to argue that $P^{M+k}(x, \tilde{y})$ converges to $\pi$. So to say the value of $M$ contains information about the outcome of randomness.

In the same (wrong) way one could argue for an arbitrary $\tilde{y}\in\Omega$:

Start with an arbitrary state $x\in\Omega$, run the chain as many times as needed for a good mixing, say $T$ times, so that the distribution of $P^t(x, \cdot)$ is $\varepsilon$-close to the stationary distribution $\pi$ for all $t\geq T$, and let $M := \min \{T'\geq T: f_{0}^{M}(x)=\tilde{y}\}$. Then certainly after $M$ steps starting from $x$ the chain will have state $\tilde{y}$, by definition of $M$, and its distribution is $\varepsilon$-close to $\pi$, hence, since $\varepsilon$ ws arbitrary, $\pi(\tilde{y})=1$.

Wow, this tells me once again that one has to be careful in mathematics.

3
On

If I understand your argument correctly, the issue does not appear to be in your math but rather the algorithmic implications. You're saying that you'll only draw a sample from the stationary distribution as $k\to\infty$, so basically you'll never know when to stop mixing and thus you can't guarantee an exact sample.

The key part of coupling from the past is that you "re-use" the randomness. For instance, instead of your $f_t(X)$ where $X$ is random, you have $f_t(x,U_t)$ where the $U_t$ are sampled once each. Then you basically start a chain "from the past" starting at each state $x$. If all chains couple by the time you get to $t=0$, you know you're at the stationary distribution. Otherwise, you restart from further in the past, but you reuse the same $U_t$ as previously (for the timesteps you already saw) so this part of the chain, given your previous run, is deterministic.

This way, if you find a good coupling, you can get a nice bound on the expected running time while ensuring an exact sample.