I have read about the coupling from the past algorithm that is used for perfect sampling from the stationary distribution of a discrete markov chain. My question is not exactly about this algorithm, but why I can't apply its proof idea as well to "forward coupling", i.e. what is wrong with the following idea.
Let $\Omega$ be the set of states. Take a distribution on the space of "time step functions" $\mathcal{F} = \{f: \Omega \to \Omega\}$ that guarantees $\mathbb{P}[f(x) = y] = P(x, y)$ for all $x, y\in \Omega$ and some transition probabilities $P(x,y)$ which are aperiodic and irreducible. For each $t\in\mathbb{Z}$ choose $f_t$ at random, independently according to this distribution, and for $t_1 < t_2$ set $f_{t_1}^{t_2} := f_{t_2 -1}\circ ... \circ f_{t_1}$.
Now I want to do a "forward coupling". Let $M := \min\{t > 0: |f_0^t(\Omega)|=1\}$, i.e. the number of time steps so that starting at time $t=0$ the paths for all start values coalesce for the first time. Of course, since the $f's$ are chosen at random, $M$ is a random variable, and we assume that our markov chain guarantees that $\mathbb{P}[M < \infty] = 1$. Now $f_0^M$ is a random constant function, so let $\tilde{y}\in \Omega$ be its unique image, i.e. $f_0^M(\Omega)=\{\tilde{y}\}$.
Say $\pi$ is the stationary distribustion on $\Omega$ with respect to $P$. It is well known that $\tilde{y}$ has not distribution $\pi$ and this is why we need to "couple into the past" instead (which brings its own difficulties), and I have seen simple counterexamples, but I don't understand why, because for all $y\in\Omega, k>0$ and with an arbitrary $x\in\Omega$:
$$\mathbb{P}[\tilde{y}=y]=\mathbb{P}[f_0^M(\Omega)=\{\tilde{y}\}]=\mathbb{P}[f_0^M(x)=\tilde{y}] = \mathbb{P}[f_0^M\circ f_{-k}^0(x)=\tilde{y}] = \mathbb{P}[f_{-k}^M(x)=\tilde{y}] = P^{M+k}(x,\tilde{y})\longrightarrow \pi(\tilde{y})$$ when $k\longrightarrow\infty$. The third equation comes from the fact that $f_0^M$ is a constant function.
Please find the mistake. Thanks in advance.
I think I have found my own mistake.
The problematic point is the probability $P^{M+k}(x, \tilde{y})$. As I mentioned, $M$ is a random variable that depends on the randomly chosen $f's$, and this makes it illegal to argue that $P^{M+k}(x, \tilde{y})$ converges to $\pi$. So to say the value of $M$ contains information about the outcome of randomness.
In the same (wrong) way one could argue for an arbitrary $\tilde{y}\in\Omega$:
Start with an arbitrary state $x\in\Omega$, run the chain as many times as needed for a good mixing, say $T$ times, so that the distribution of $P^t(x, \cdot)$ is $\varepsilon$-close to the stationary distribution $\pi$ for all $t\geq T$, and let $M := \min \{T'\geq T: f_{0}^{M}(x)=\tilde{y}\}$. Then certainly after $M$ steps starting from $x$ the chain will have state $\tilde{y}$, by definition of $M$, and its distribution is $\varepsilon$-close to $\pi$, hence, since $\varepsilon$ ws arbitrary, $\pi(\tilde{y})=1$.
Wow, this tells me once again that one has to be careful in mathematics.