If $Y\sim\mu$ with probability $p$ and $Y\sim\kappa(X,\;\cdot\;)$ otherwise, what's the conditional distribution of $Y$ given $X$?

425 Views Asked by At

Let

  • $(\Omega,\mathcal A,\operatorname P)$ be a probability space
  • $(E,\mathcal E)$ be a measurale space
  • $\mu$ be a probability measure on $(E,\mathcal E)$
  • $X$ be an $(E,\mathcal E)$-valued random variable on $(\Omega,\mathcal A,\operatorname P)$
  • $\kappa$ be a Markov kernel on $(E,\mathcal E)$
  • $p\in[0,1]$

Assume we construct an $(E,\mathcal E)$-valued random variable $Y$ on $(\Omega,\mathcal A,\operatorname P)$ in the following way: With probability $p$ we draw $Y$ from $\mu$ and with probability $1-p$ we draw $Y$ from $\kappa(X,\;\cdot\;)$.

What's the conditional distributon $\operatorname P\left[Y\in\;\cdot\;\mid X\right]$ of $Y$ given $X$? In particular, I want to determine the Markov kernel $Q$ on $(E,\mathcal E)$ such that $$\operatorname P\left[Y\in B\mid X\right]=Q(X,B)\;\;\;\text{almost surely for all }B\in\mathcal E.\tag1$$

In order to give a rigorous answer, I think that we need to introduce a $\{0,1\}$-valued $p$-Bernoulli distributed random variable $Z$ on $(\Omega,\mathcal A,\operatorname P)$ such that

  1. $X$ and $Z$ are independent
  2. $X$ and $Y$ are independent given $\{Z=1\}$
  3. $\operatorname P\left[Y\in B\mid Z=1\right]=\mu(B)$ for all $B\in\mathcal E$
  4. $\operatorname P\left[Y\in B\mid X\right]=\kappa(X,B)$ almost surely on $\{Z=0\}$ for all $B\in\mathcal E$

At first glance, I thought this would be an easy task. However, I don't know how I need to proceed. First of all, is my (supposed to be equivalent) description of the problem with the random variable $Z$ correct or did I impose any false assumption?

If the description is correct, how do we need to proceed?

Please take note of this related question: I we sample with a fixed probability from a distribution, what does this theoretical rigorously mean?.

2

There are 2 best solutions below

8
On BEST ANSWER

Some notation. When $\nu$ is a probability measure on a space $E$ and $\kappa$ is a Markov kernel on the same space, the semidirect product $\nu\rtimes \kappa$ is the measure on $E\times E$ (equipped with product $\sigma$-algebra) satisfying $$ (\nu\rtimes \kappa)(A\times B)=\nu(1_A\cdot \kappa 1_B). $$ It is the law of the first two steps of a Markov chain with initial distribution $\mu$ and transition kernel $\kappa$.

Formalizing the question. Let Ber$_p$ denote the probability measure on $\{0,1\}$ satisfying Ber$_p(\{1\})=p$. Consider the enlarged sample space $\Gamma=E^3\times \{0,1\}$ with the product $\sigma$-algebra, and equip $\Gamma$ with the probability measure $\mathbb P=\mu\otimes(\nu\rtimes \kappa)\otimes \textrm{Ber}_p$, where $\nu$ denotes the law of $X$.

Consider the function $f\colon \Gamma\to E$ given by $$ f(w,x,y,z)=\begin{cases}y,& z = 0\\ w,& z = 1\end{cases}. $$ When $f$ is regarded as a random element of $E$, it is precisely the result of "sampling from $\mu$ with probability $p$ and from $\kappa(X,\cdot)$ with probability $1-p$" in the way you have described.

Phrased in this precise and rigorous way, your question asks the following.

Reformulated question. For any $B\in\mathcal E$, determine the conditional probability $\mathbb P(f\in B\mid x)$.

You have guessed a formula for this conditional probability, which we will now verify.

Claim. The random variable $(1-p)\kappa(x, B)+p\mu(B)$ on $\Gamma$ is a version of $\mathbb P(f\in B\mid x)$.

In the proof of this claim, we will use notation like $\mathbb E[\textrm{variable};\textrm{conditions}]$ as a shorthand for the expectation of (variable times the indicator of the conditions) with respect to $\mathbb P$.

Proof. Unwinding the definition of conditional probability, the claim amounts to showing that $$ \mathbb P(f\in B,x\in A)=(1-p)\mathbb E[\kappa(x, B);x\in A]+p\mu(B)\mathbb P(x\in A)\tag{1}, $$ for all sets $A\in \mathcal E$. Splitting up the left side, we see that $$ \mathbb P(f\in B,x\in A)=\mathbb P(f\in B,z=0,x\in A)+\mathbb P(f\in B,z=1,x\in A). $$ On $z=0$, we have $f=y$ and on $z=1$, we have $f=w$. Thus $$ \mathbb P(f\in B,x\in A)=\mathbb P(y\in B,z=0,x\in A)+\mathbb P(w\in B,z=1,x\in A). $$ Using independence (coming from the product structure of $\mathbb P$) then yields $$ \mathbb P(f\in B,x\in A)=(1-p)\mathbb P(y\in B,x\in A)+p\mu(B)\mathbb P(x\in A). $$ Recalling that the law of $(x,y)$ is $\nu\rtimes \kappa$ and directly applying the definition of the semidirect product yields $\mathbb P(y\in B,x\in A)=\mathbb E[\kappa(x,B);x\in A]$. Substituting this into the previous display yields $(1)$, establishing the claim.

0
On

Maybe we need to formulate this differently. (If I'm wrong and the following description is not equivalent to the situation described in the question, please let me know)

Let's replace 3. and 4. by

  1. $\operatorname P\left[Y\in B\mid X,Z\right]=\mu(B)$ almost surely on $\left\{Z=1\right\}$ for all $B\in\mathcal E$
  2. $\operatorname P\left[Y\in B\mid X,Z\right]=\kappa(X,B)$ almost surely on $\left\{Z=0\right\}$ for all $B\in\mathcal E$

Moreover, discard 2. (I think that the independence in 2. is already expressed in 5. - maybe someone can elaborate on this in the comments) and only keep 1.

Let $B\in\mathcal E$. By 5. and 6., $$\operatorname P\left[Y\in B\mid X,Z\right]=1_{\left\{\:Z\:=\:1\:\right\}}\mu(B)+1_{\left\{\:Z\:=\:0\:\right\}}\kappa(X,B)\;\;\;\text{almost surely}.\tag2$$ By 1., $$\operatorname P\left[Z=1\mid X\right]=\operatorname P\left[Z=1\right]\;\;\;\text{almost surely}\tag3$$ and $$\operatorname E\left[1_{\left\{\:Z\:=\:0\:\right\}}\kappa(X,B)\mid X\right]=\operatorname P\left[Z=0\mid X\right]\kappa(X,B)=\operatorname P\left[Z=0\right]\kappa(X,B)\;\;\;\text{almost surely}.\tag4$$ Thus, \begin{equation} \begin{split} \operatorname P\left[Y\in B\mid X\right]&=\operatorname E\left[\operatorname P\left[Y\in B\mid X,Z\right]\mid X\right]\\&=\operatorname E\left[1_{\left\{\:Z\:=\:1\:\right\}}\mu(B)+1_{\left\{\:Z\:=\:0\:\right\}}\kappa(X,B)\mid X\right]\\&=\operatorname P\left[Z=1\mid X\right]\mu(B)+\operatorname E\left[1_{\left\{\:Z\:=\:0\:\right\}}\kappa(X,B)\mid X\right]\\&=p\mu(B)+(1-p)\kappa(X,B) \end{split}\tag5 \end{equation}

almost surely.

So, the desired Markov kernel should be $$Q(x,\;\cdot\;):=p\mu+(1-p)\kappa(x,\;\cdot\;)\;\;\;\text{for }x\in E.$$ (Note that the convex combination of probability measures is a probability measure.)