Rigorous definition of the conditional expectations $E(X|Y=y)$ when $P(Y=y)=0$

621 Views Asked by At

Let $X$ be an integrable random variable on $(\Omega, \mathfrak A, P)$.

I've learned that for an event $A$ of non-zero probability, $$ E(X|A) = \int X(\omega) \,dP(\omega|A) = \frac{1}{P(A)}\int_A X \,dP,\tag 1 $$ where $P(\cdot|A)$ is the probability measure on $\mathfrak A$ given by $B\mapsto P(B \cap A)/P(A).$

Moreover, for a sub-sigma algebra $\mathfrak C \subset \mathfrak A,$ $E(X|\mathfrak C)$ is (a.s. uniquely) defined as an integrable function satisfying

  1. $E(X|\mathfrak C)$ is $\mathfrak C$-measurable
  2. $\forall C \in \mathfrak C,\, \int_C E(X|\mathfrak C) \,dP = \int_C X \,dP.$

For integrable random variables $X, Y,\,$ $E(X|Y)$ is defined as $E(X|\sigma(Y)).$

But very often I see something like $E(X|Y=y)$ written. If $P(Y=y)>0$ this reduces to $(1)$ but if $Y$ is, for example, continuous w.r.t. Lebesgue-measure, then this doesn't work.

Is there a general definition of $E(X|Y=y)$ which works for discrete and continuous $Y$ and in the case where $Y$ may not have a density?

1

There are 1 best solutions below

5
On BEST ANSWER

A piece of your own, quite lucid, exposition whose importance might have escaped you is the fact that, since the random variable $E(X\mid Y)$ is, by definition, $\sigma(Y)$-measurable, there exists some measurable function $g$ such that $E(X\mid Y)=g(Y)$ almost surely. Then, indeed, it is customary to set $E(X\mid Y=y)=g(y)$ for every $y$.

But note this: if some function $g$ fits the bill, every other measurable function $\bar g$ such that $\bar g=g$ $P_Y$-almost surely, also does. Hence $E(X\mid Y=y)$ is not uniquely defined, pointwise. That is, except at the points $y$ such that $P(Y=y)\ne0$, and then $E(X\mid Y=y)=E(X\mathbf 1_{Y=y})/P(Y=y)$, so all is well.

This latitude in the choice of the function $g$ reflects the fact that the random variables $E(X\mid Y)$ can only be defined up to $P$-null sets and that, rigorously speaking, one should always write that $E(X\mid Y)=g(Y)$ almost surely, instead of simply that $E(X\mid Y)=g(Y)$.

Then, $E(X\mid Y)=g(Y)$ almost surely, $E(X\mid Y)=\bar g(Y)$ almost surely, and $g(Y)=\bar g(Y)$ almost surely (this is what the hypothesis that $g=\bar g$ $P_Y$-almost surely, amounts to), hence the latitude mentioned above does not lead to a contradiction.

This also explains why one rarely sees formulas involving $E(X\mid Y=y)$ in measure theoretic probability lectures, but always random variables $E(X\mid Y)$, uniquely defined up to $P$-null sets.

Example: Consider $X=Y$ uniform on $(0,1)$, then $g$ the identity function works, but also the function $\bar g:\mathbb R\to\mathbb R$ defined by $\bar g(y)=y\mathbb 1_{y\notin\mathbb Q}$. Then $g(y)\ne\bar g(y)$ for every rational number $y\ne0$. One sees that, for each specific $y$ such that $P(Y=y)=0$, the value of $g(y)=E(X\mid Y=y)$ can be anything without failing the definition. In the end, the only property that matters is that $E(X\mathbb 1_{Y\in B})=E(g(Y)\mathbb 1_{Y\in B})$ for every Borel set $B$ of $\mathbb R$.