Let
- $(E,\mathcal E,\lambda)$ be a measure space;
- $p:E\to[0,\infty)$ be $\mathcal E$-measurable with $$c:=\int p\:{\rm d}\lambda\in(0,\infty)$$ and $\mu$ denote the measure with density $\frac pc$ with respect to $\lambda$;
- $q:E^2\to[0,\infty)$ be $\mathcal E{\otimes2}$-measurable with $$c_x:=\int q(x,\;\cdot\;)\:{\rm d}\lambda\in(0,\infty)\;\;\;\text{for all }x\in E$$ and $Q(x,\;\cdot\;)$ denote the measure with density $\frac{q(x,\;\cdot\;)}{c_x}$ with respect to $\lambda$;
- $$\alpha(x,y):=\left.\begin{cases}\displaystyle\min\left(1,\frac{p(y)q(y,x)}{p(x)q(x,y)}\right)&\text{, if }p(x)q(x,y)\ne0\\1&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }x,y\in E;$$
- $$r(x):=1-\int Q(x,{\rm d}y)\alpha(x,y)\;\;\;\text{for }x\in E;$$
- $\delta_x$ denote the Dirac measure on $(E,\mathcal E)$ at $x\in E$ and $$\kappa(x,B):=\int_BQ(x,{\rm d}y)\alpha(x,y)+r(x)\delta_x(B)\;\;\;\text{for }(x,B)\in E\times\mathcal E;$$
- $(\Omega,\mathcal A,\operatorname P)$ be a probability space;
- $(X_n)_{n\in\mathbb N_0}$ and $(Y_n)_{n\in\mathbb N}$ denote the Markov chain and proposal sequence generated by the Metropolis-Hastings algorithm with proposal kernel $Q$ and target distribution $\mu$, respectively, and $$Z_n:=(X_{n-1},Y_n)\;\;\;\text{for }n\in\mathbb N.$$
By construction, there is a $[0,1)$-valued independent identically $\mathcal U_{[0,\:1)}$-distributed process $(U_n)_{n\in\mathbb N}$ on $(\Omega,\mathcal A,\operatorname P)$ with
- $(U_1,\ldots,U_n)$ and $(Z_1,\ldots,Z_n)$ are independent;
- $$X_n=\begin{cases}Y_n&\text{if }U_n\le\alpha(Z_n);\\X_{n-1}&\text{otherwise}\end{cases}$$
for all $n\in\mathbb N$. Now, let $$A_n:=\left\{U_1>\alpha(Z_1),\ldots,U_{n-1}>\alpha(Z_{n-1}),U_n\le\alpha(Z_n)\right\}\;\;\;\text{for }n\in\mathbb N.$$ It should hold $$\operatorname P\left[A_n\mid Z_1,\ldots,Z_n\right]=\prod_{i=1}^n\left(1-\alpha(Z_i)\right)\alpha(Z_n)\tag1$$ for all $n\in\mathbb N$. Let $\tau_0:=0$ and $$\tau_k:=\inf\{n>\tau_{k-1}:U_n\le\alpha(Z_n)\}\;\;\;\text{for }k\in\mathbb N.$$ Moreover, let $$\tilde X_k:=\left.\begin{cases}X_{\tau_k}&\text{, if }\tau_k<\infty;\\X_{\tau_{k-1}}&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }k\in\mathbb N_0.$$ Let $B_0,B_1\in\mathcal E$. Note that \begin{equation}\begin{split}&\operatorname P\left[\tilde X_0\in B_0,\tilde X_1\in B_1\right]\\&\;\;\;\;=\sum_{n\in\mathbb N}\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]+\operatorname P\left[X_0\in B_0\cap B_1,\tau_1=\infty\right].\end{split}\tag2\end{equation}
Now, I would really like to conclude that \begin{equation}\begin{split}&\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]\\&\;\;\;\;=\operatorname E\left[\prod_{i=1}^{n-1}\left(1-\alpha(X_0,Y_i)\right)\alpha(X_0,Y_n);(X_0,Y_n)\in B_0\times B_1)\right]\\&\;\;\;\;=\left(\int\operatorname P\left[X_0\in{\rm d}x\right]\int Q(x,{\rm d}y)\left(1-\alpha(x,y)\right)\right)^{n-1}\int_{B_0}\operatorname P\left[X_0\in{\rm d}x\right]\int_{B_1}Q(x,{\rm d}y)\alpha(x,y)\end{split}\tag3\end{equation} for all $n\in\mathbb N$. However, while the first equality should hold by $(2)$, I don't get how I can rigorously justify the second equality (thought it is intuitively clear).
I think we need something like "$Y_1,\ldots,Y_n$ are conditionally independent given $X_0$ on $A_n$". The distribution of each $Y_i$ is $\mathcal L(X_{i-1})Q$. But on the right-hand side of the first equality of $(3)$, we somehow lost the information that $X_0=\cdots=X_{n-1}$ on $A_n$. Did I made a mistake before?
I don't think you can separate the integrals like that in the last equation. To see why, you could instead use the kernel on the augmented state space (call it $P$), then integrate directly using the finite-dimensional distribution $\mathsf{P}_{x_0} := \delta_{x_0} \otimes P^{\otimes n}$. Also, your equation (1) seems strange because the $Z$ sequence needs to depend on uniform variates.
Details:
Assuming WLOG that $q(\cdot, \cdot)$ is dominated by Lebesgue measure, the kernel on the augmented state space is $$ P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] = p(u_1)q(x_0, y_1) 1(u_1 \le a[x_0, y_1)]\delta_{y_1} (dx_1) du_1 dy_1 + p(u_1)q(x_0, y_1) 1(u_1 > a[x_0, y_1)]\delta_{x_0} (dx_1) du_1 dy_1, $$ In particular, if $x_0 \notin B$, then $$ P[ X_1 \in B \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = \int_B q(x_0, y_1) a(x_0, y_1) $$
and $$ P[ X_1 =x_0 \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = r(x_0). $$
So \begin{align*} &\mathsf{P}_{x_0}\left[Z_n\in B_0\times B_1,A_n\right] \\ &= \mathsf{P}_{x_0}\left[X_{n-1} \in B_0, Y_n \in B_1 ,U_1>\alpha(X_0, Y_1),\ldots,U_{n-1}>\alpha(X_{n-2}, Y_{n-1}),U_n\le\alpha(X_{n-1}, Y_{n})\right] \\ &= \mathsf{P}_{x_0}\left[g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \right] \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-1}, y_{n-1}, u_{n-1}), (dx_n, dy_n, du_n)]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-2}, y_{n-2}, u_{n-2}), (dx_{n-1}, dy_{n-1}, du_{n-1})]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-3}, y_{n-3}, u_{n-3}), (dx_{n-2}, dy_{n-2}, du_{n-2})]\tilde{g}(x_0,y_0,u_0, \ldots, x_{n-2},y_{n-2},u_{n-2}) 1(x_{n-1} \in B_0) 1(u_{n-1} > \alpha(x_{n-2}, y_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \end{align*}
In the last line, when you start integrating with respect to $x_{n-1}, y_{n-1}$ and $u_{n-1}$, then you're not just integrating those indicators. There's that $\int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) $ term in there, too.
Other notes: