How can we rigorously show conditional independence here?

Question

How can we rigorously show conditional independence here?

87 Views Asked by Bumbble Comm At 31 Mar 2026 - 10:06

Let

$(E,\mathcal E,\lambda)$ be a measure space;
$p:E\to[0,\infty)$ be $\mathcal E$-measurable with $$c:=\int p\:{\rm d}\lambda\in(0,\infty)$$ and $\mu$ denote the measure with density $\frac pc$ with respect to $\lambda$;
$q:E^2\to[0,\infty)$ be $\mathcal E{\otimes2}$-measurable with $$c_x:=\int q(x,\;\cdot\;)\:{\rm d}\lambda\in(0,\infty)\;\;\;\text{for all }x\in E$$ and $Q(x,\;\cdot\;)$ denote the measure with density $\frac{q(x,\;\cdot\;)}{c_x}$ with respect to $\lambda$;
$$\alpha(x,y):=\left.\begin{cases}\displaystyle\min\left(1,\frac{p(y)q(y,x)}{p(x)q(x,y)}\right)&\text{, if }p(x)q(x,y)\ne0\\1&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }x,y\in E;$$
$$r(x):=1-\int Q(x,{\rm d}y)\alpha(x,y)\;\;\;\text{for }x\in E;$$
$\delta_x$ denote the Dirac measure on $(E,\mathcal E)$ at $x\in E$ and $$\kappa(x,B):=\int_BQ(x,{\rm d}y)\alpha(x,y)+r(x)\delta_x(B)\;\;\;\text{for }(x,B)\in E\times\mathcal E;$$
$(\Omega,\mathcal A,\operatorname P)$ be a probability space;
$(X_n)_{n\in\mathbb N_0}$ and $(Y_n)_{n\in\mathbb N}$ denote the Markov chain and proposal sequence generated by the Metropolis-Hastings algorithm with proposal kernel $Q$ and target distribution $\mu$, respectively, and $$Z_n:=(X_{n-1},Y_n)\;\;\;\text{for }n\in\mathbb N.$$

By construction, there is a $[0,1)$-valued independent identically $\mathcal U_{[0,\:1)}$-distributed process $(U_n)_{n\in\mathbb N}$ on $(\Omega,\mathcal A,\operatorname P)$ with

$(U_1,\ldots,U_n)$ and $(Z_1,\ldots,Z_n)$ are independent;
$$X_n=\begin{cases}Y_n&\text{if }U_n\le\alpha(Z_n);\\X_{n-1}&\text{otherwise}\end{cases}$$

for all $n\in\mathbb N$. Now, let $$A_n:=\left\{U_1>\alpha(Z_1),\ldots,U_{n-1}>\alpha(Z_{n-1}),U_n\le\alpha(Z_n)\right\}\;\;\;\text{for }n\in\mathbb N.$$ It should hold $$\operatorname P\left[A_n\mid Z_1,\ldots,Z_n\right]=\prod_{i=1}^n\left(1-\alpha(Z_i)\right)\alpha(Z_n)\tag1$$ for all $n\in\mathbb N$. Let $\tau_0:=0$ and $$\tau_k:=\inf\{n>\tau_{k-1}:U_n\le\alpha(Z_n)\}\;\;\;\text{for }k\in\mathbb N.$$ Moreover, let $$\tilde X_k:=\left.\begin{cases}X_{\tau_k}&\text{, if }\tau_k<\infty;\\X_{\tau_{k-1}}&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }k\in\mathbb N_0.$$ Let $B_0,B_1\in\mathcal E$. Note that \begin{equation}\begin{split}&\operatorname P\left[\tilde X_0\in B_0,\tilde X_1\in B_1\right]\\&\;\;\;\;=\sum_{n\in\mathbb N}\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]+\operatorname P\left[X_0\in B_0\cap B_1,\tau_1=\infty\right].\end{split}\tag2\end{equation}

Now, I would really like to conclude that \begin{equation}\begin{split}&\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]\\&\;\;\;\;=\operatorname E\left[\prod_{i=1}^{n-1}\left(1-\alpha(X_0,Y_i)\right)\alpha(X_0,Y_n);(X_0,Y_n)\in B_0\times B_1)\right]\\&\;\;\;\;=\left(\int\operatorname P\left[X_0\in{\rm d}x\right]\int Q(x,{\rm d}y)\left(1-\alpha(x,y)\right)\right)^{n-1}\int_{B_0}\operatorname P\left[X_0\in{\rm d}x\right]\int_{B_1}Q(x,{\rm d}y)\alpha(x,y)\end{split}\tag3\end{equation} for all $n\in\mathbb N$. However, while the first equality should hold by $(2)$, I don't get how I can rigorously justify the second equality (thought it is intuitively clear).

I think we need something like "$Y_1,\ldots,Y_n$ are conditionally independent given $X_0$ on $A_n$". The distribution of each $Y_i$ is $\mathcal L(X_{i-1})Q$. But on the right-hand side of the first equality of $(3)$, we somehow lost the information that $X_0=\cdots=X_{n-1}$ on $A_n$. Did I made a mistake before?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2023-07-07 00:14:07

I don't think you can separate the integrals like that in the last equation. To see why, you could instead use the kernel on the augmented state space (call it $P$), then integrate directly using the finite-dimensional distribution $\mathsf{P}_{x_0} := \delta_{x_0} \otimes P^{\otimes n}$. Also, your equation (1) seems strange because the $Z$ sequence needs to depend on uniform variates.

Details:

Assuming WLOG that $q(\cdot, \cdot)$ is dominated by Lebesgue measure, the kernel on the augmented state space is $$ P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] = p(u_1)q(x_0, y_1) 1(u_1 \le a[x_0, y_1)]\delta_{y_1} (dx_1) du_1 dy_1 + p(u_1)q(x_0, y_1) 1(u_1 > a[x_0, y_1)]\delta_{x_0} (dx_1) du_1 dy_1, $$ In particular, if $x_0 \notin B$, then $$ P[ X_1 \in B \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = \int_B q(x_0, y_1) a(x_0, y_1) $$

and $$ P[ X_1 =x_0 \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = r(x_0). $$

So \begin{align*} &\mathsf{P}_{x_0}\left[Z_n\in B_0\times B_1,A_n\right] \\ &= \mathsf{P}_{x_0}\left[X_{n-1} \in B_0, Y_n \in B_1 ,U_1>\alpha(X_0, Y_1),\ldots,U_{n-1}>\alpha(X_{n-2}, Y_{n-1}),U_n\le\alpha(X_{n-1}, Y_{n})\right] \\ &= \mathsf{P}_{x_0}\left[g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \right] \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-1}, y_{n-1}, u_{n-1}), (dx_n, dy_n, du_n)]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-2}, y_{n-2}, u_{n-2}), (dx_{n-1}, dy_{n-1}, du_{n-1})]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-3}, y_{n-3}, u_{n-3}), (dx_{n-2}, dy_{n-2}, du_{n-2})]\tilde{g}(x_0,y_0,u_0, \ldots, x_{n-2},y_{n-2},u_{n-2}) 1(x_{n-1} \in B_0) 1(u_{n-1} > \alpha(x_{n-2}, y_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \end{align*}

In the last line, when you start integrating with respect to $x_{n-1}, y_{n-1}$ and $u_{n-1}$, then you're not just integrating those indicators. There's that $\int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) $ term in there, too.

Other notes:

"The first equality should hold by (2)." (2) is the first-entrance decomposition. You're using the weak Markov property here.
I'm not sure what your end goal is here, but you might also consider using the "induced kernel" (see defn. 3.3.7 from Markov Chains) and iterating expectations based on random times. The stopping times you could condition on would be the proposal acceptances.

How can we rigorously show conditional independence here?

There are 1 best solutions below

Details:

Related Questions in PROBABILITY-THEORY

Related Questions in MEASURE-THEORY

Related Questions in STOCHASTIC-PROCESSES

Related Questions in CONDITIONAL-EXPECTATION

Related Questions in MARKOV-PROCESS

Trending Questions

Popular # Hahtags

Popular Questions