"Plugging into" conditional probability - why does it work?

62 Views Asked by At

I think the best way to ask this question is using an example.

Let $X$ be a continuous random variable and $Y$ a (not necessarily continuous) random variable that is independent of $X.$ Consider the following proof that $P(X=Y)=0.$

Write $$P(X=Y) = E[P(X=Y|Y)] = E[E(1_{\{X=Y\}}|Y)] = E[E(Z|Y)], \tag{1}$$ where $Z=1_{\{X=Y\}}.$

By the factorization lemma, there exists a measurable function $g$ such that $E(Z|Y) = g(Y)$ (almost surely). One often writes $E(Z|Y=y):= g(y).$ Now, for any $y\in \bar{\mathbb R}$ it holds that \begin{align} g(y)&=E(Z|Y=y)\\ &= P(X=Y|Y=y)\text{ (by $(1)$) }\\ &=P(X=y|Y=y) \tag{2}\\ &=P(X=y) \text{ (by independence) } \tag{3}\\ &=0 \text{ (since $X$ is a continuous r.v.) } \end{align}

This implies that $P(X=Y|Y)=0$ a.s. and hence $P(X=Y) = E[P(X=Y|Y)] = 0.$

Question: Why is step $(2)$ valid? (is it?)

Why can we plug $Y=y$ into the probability? This is "notationally obvious", but I'm looking for a rigorous explanation.

Edit: Thinking about this, I'm also unsure about $(3).$ The probability of $Y=y$ might be zero so independence doesn't help.

2

There are 2 best solutions below

3
On

The random variables $X$ and $Y$ are measurable functions $X,Y\colon \Omega\to\mathbb R$ on the probability space $\Omega$. Let $\mathcal F\subseteq 2^\Omega$ denote the the $\sigma$-algebra of measurable events and recall that the probability measure is a map $P\colon\mathcal F\to[0,1]$ and things like $X=Y$ and $Y=y$ are just shorthand notations for the events $\{\omega\in\Omega:X(\omega)=Y(\omega)\}$ and $\{\omega\in\Omega : Y(\omega)=y\}=Y^{-1}(y)$, respectively.

The "substitution" you mention boils down to the following equality of subsets of $\Omega$: $$ \bigg((X=Y)\cap(Y=y)\bigg) = \bigg( (X=y) \cap (Y=y) \bigg). $$

Indeed, both are equal to $\{\omega\in\Omega : X(\omega)=Y(\omega)=y\}$.

Regardless of how exactly you define $P(\,-\,|\,Y=y\,)$ in case that $Y=y$ has measure zero, it will be a probability measure on $Y=y$ (that is, on the subspace $Y^{-1}(y)\subseteq\Omega$). Let $X|_{Y=y}$ and $Y|_{Y=y}$ denote the restrictions of $X$ and $Y$ to that subspace, then the above equality becomes an equality of measurable subsets of $Y=y$: $$ \bigg(X|_{Y=y} = Y|_{Y=y}\bigg) = \bigg(X|_{Y=y} = y\bigg). $$

Hence, both sets have the same measure under $P(\,-\,|\,Y=y\,)$.

0
On

Not really an answer to your question, but a more convenient route that leads to $P(X=Y)=0$ in this context.

Let it be that for $x,y\in\mathbb R$ we have:

  • $[x=y]=1$ if $x=y$ and $[x=y]=0$ otherwise.

Then because $X$ and $Y$ are independent and $X$ has continuous distribution we have:

$$P(X=Y)=\mathbb E[X=Y]=\int\int[x=y]F_X(x)F_Y(y)=\int P(X=y)F_Y(y)=\int0F_Y(y)=0$$


Concerning your question about step 2:

Troubles arise if we define: $P(A\mid B):=P(A\cap B)/P(B)$ because a denominator appears that might equal $0$.

In my view it is better to define $P(A\mid B)$ indirectly by stating that $p=P(A\mid B)$ whenever $p\times P(B)=P(A\cap B)$.

Then if indeed $P(B)=0$ there are several candidates for $p$ and we can pick out the one that is most suitable (and corresponds with our intuition).

In this context note that:

$P(X=Y\mid Y=y)P(Y=y)$ and $P(X=y\mid Y=y)P(Y=y)$ both equalize $$P(X=Y\wedge Y=y)=P(X=y\wedge Y=y)$$