What can we say about a regular version of the conditional distribution given a random variable $X$ on the set $\left\{X=x\right\}$?

52 Views Asked by At

Let

  • $(\Omega,\mathcal A,\operatorname P)$ be a probability space
  • $(E_i,\mathcal E_i)$ be a measurable space
  • $X_i:\Omega\to E_i$

Assume $X_2$ is $(\mathcal A,\mathcal E_2)$-measurable and that there is a regular version $\kappa$ of the conditional distribution of $X_2$ given $X_1$ on $(\Omega,\mathcal A,\operatorname P)$, i.e. $\kappa$ is a Markov kernel with source $(\Omega,\sigma(X_1))$ and target $(E_2,\mathcal E_2)$ with $$\operatorname P\left[X_2\in B_2\mid X_1\right]=\kappa(\;\cdot\;,B_2)\;\;\;\text{almost surely for all }B_2\in\mathcal E_2\tag1.$$

I've often seen the claim (see, for example, Definition 8.28) that $$(E_1,\mathcal E_2)\ni(x_1,B_2)\mapsto\kappa\left(X_1^{-1}\left(\left\{x_1\right\}\right),B_2\right)=\operatorname P\left[Y\in B_2\mid X_1=x_1\right]\tag2$$ is a Markov kernel with source $(E_1,\mathcal E_1)$ and target $(E_2,\mathcal E_2)$. (You might want to compare this with the (strange looking) claim on Wikipedia about the "topological support of the distribution of $X_1$ under $\operatorname P$")

Neither I see why $(2)$ is well-defined nor why the claimed equality holds. The author is assuming that $X_1^{-1}\left(\left\{x_1\right\}\right)$ is set to an arbitrary value if $x_1\not\in X_1(\Omega)$. That's not the problem. The most obvious problem is that $\left\{x_1\right\}$ might not belong to $\mathcal E_1$. However, let's assume that each singleton set is contained in $\mathcal E_1$. We still got the problem that the right-hand side of $(2)$ is $0$ if $\operatorname P\left[X_1=x_1\right]=0$; and that might be the case for all $x_1$ (for example if the distribution of $X_1$ under $\operatorname P$ has a density with respect to the Lebesgue measure).

Having said that, what's clear to me is that we can easily find (without any assumption on $(E_1,\mathcal E_1)$) a Markov kernel $\tilde\kappa$ with source $(E_1,\mathcal E_1)$ and target $(E_2,\mathcal E_2)$ with $$\operatorname P\left[X_2\in B_2\mid X_1\right]=\tilde\kappa(X_1,B_2)\;\;\;\text{almost surely for all }B_2\in\mathcal E_2\tag3$$ and $\tilde\kappa(\;\cdot\;,B_2)$ is uniquely determined on $X_1(\Omega)$ for all $B_2\in\mathcal E_2$. By $(3)$ we immediately obtain $$\operatorname P\left[\left(X_1,X_2\right)\in\;\cdot\;\right]=\operatorname P\left[X_1\in\;\cdot\;\right]\otimes\tilde\kappa\tag4$$ (where the right-hand side denotes the product of transition kernels).

We trivially obtain $$\operatorname P\left[X_2\in B_2\mid X_1=x_1\right]=\tilde\kappa(x_1,B_2)\tag5$$ for all $x_1\in E_1$ with $\operatorname P\left[X_1=x_1\right]>0$ (hence, maybe none) and $B_2\in\mathcal E_2$. Is there anything I'm missing?

EDIT: I've just observed that $$\operatorname P\left[X_2\in B_2\mid X_1\right]=\tilde\kappa(x_1,B_2)\;\;\;\text{on }\left\{X_1=x_1\right\}\text{ for all }x_1\in X_1(\Omega)\text{ almost surely}\tag6$$ for all $B_2\in\mathcal E_2$ (note that the null set in $(6)$ does depend on $B_2$ only); maybe that's what's actually meant.