Understanding conditional expectation using measure-theoretical definition

412 Views Asked by At

Definition: For a random variables $X\in\mathbb R^{d_1}$ and $Y\in\mathbb R^{d_2}$, we define a conditional expectation of $X$ given $Y$ by any random variable $Z$ satisfying:

  • there exists $g:\mathbb R^{d_2}\rightarrow\mathbb R^{d_1}$ such that $Z=g(Y)$ and
  • $\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ for all $A\subseteq \mathbb R^{d_2}$

To be honest I don't understand the definition. Like

  • the reason for requiring $\mathbb E[X|Y]$ to be a function of $Y$
  • Why $\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ needed for all $A\subseteq \mathbb R^{d_2}?$

Here is one example they mentioned:

$\Omega=[-1,1]$ and $\mathbb P$ is uniform distribution. Define $$\begin{align}X(\omega)&=-\frac12+\unicode{x1D7D9}_{\{\omega\in[-1,-1/2]\cup[0,1/2]\}}+2\unicode{x1D7D9}_{\{\omega\in[-1/2,0]\}}\\Y(\omega)&=\unicode{x1D7D9}_{\{\omega\geq0\}}\\Z(\omega)&=1-Y(\omega)\end{align}$$ Then $\mathbb E[X|Y]=Z$ and $\mathbb P(X=Z)=0$

I didn't get how to compute conditional expectation using the above definition.


Here is another definition from A First Look at Rigorous Probability Theory, by Jeffrey S. Rosenthal

Definition: If $Y$ is a random variable, and if we define $v $ by $v(S)=\mathbb P(Y\in S|B)=\mathbb P(Y\in S,B)/P(B)$, then $v=\mathcal L(Y|B)$ is a probability measure, called the conditional distribution of $Y$ given $B$. $\mathcal L(Y\unicode{x1D7D9}_{B})=\mathbb P(B)\mathcal L(Y|B)+P(B^c)\delta_0$, so taking expectations and re-arranging, $$\mathbb E(Y|B)=\mathbb E(Y\unicode{x1D7D9}_{B})/\mathbb P(B)$$

Here also I can't understand the role of $v$ and how it creates similar thing with above definition.

2

There are 2 best solutions below

5
On BEST ANSWER

The problem with using $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$ is that $\mathbb{P}(Y=y)$ may be $0$ for all $y$, for example if $Y$ is a normally distributed random variable.

We require $\mathbb{E}[X|Y]$ to be a function of $Y$ because we want to capture the idea that knowing $Y$ should be enough to compute $\mathbb{E}[X|Y]$, i.e. $\mathbb{E}[X|Y]$ depends only on the value of $Y$.

The condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ for all $A \subset \mathbb{R}^{d_2}$ (typically the definition is that $A$ is a Borel measurable subset, but that's not too important here) is sort of the generalization of $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$. If we had that $\mathbb{P}(Y=y) > 0$, then we could set $A = \{y\}$ so that $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = g(y) \mathbb{P}(Y=y)$ and the condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ would become \begin{align}g(y)\mathbb{P}(Y=y) &= \mathbb{E}[X\unicode{x1D7D9}_{Y =y}] \notag \\ g(y) &=\frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =y}]}{\mathbb{P}(Y=y)},\end{align} so $\mathbb E[X|Y=y]$ would be defined the way you suggested. This property agrees with your definition when $\mathbb{P}(Y=y) > 0$, but still works for continuous random variables where $\mathbb{P}(Y=y) = 0$ for all $y$.

For the example given in the post, we have that $\mathbb{P}(Y = 1) = \mathbb{P}(Y=0) = \frac 12$, so we only need to find $g(0)$ and $g(1)$. Using the above equation for $g(y)$, we compute \begin{align*} g(0) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =0}]}{\mathbb{P}(Y=0)} = 2 \int_{-1}^0 X(\omega) d \mathbb{P}(\omega) = \int_{-1}^0 X(\omega)d\omega = 1 \\ g(1) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =1}]}{\mathbb{P}(Y=1)} = 2 \int_{0}^1 X(\omega) d \mathbb{P}(\omega) = \int_{0}^1 X(\omega) d\omega = 0, \end{align*} so $$\mathbb{E}[X|Y] = g(Y) = 1-Y.$$

1
On

Here is another approach to the OP question. Since the question is about functions with values on Euclidean space, it is enough to consider each component of the random vectors and so, it suffices to consider the case where random variables take values on $\mathbb{R}$.

Here we give two alternative definitions of Conditional expectation. The first is rather geometric (orthogonal projections of Hilbert spaces) and should probably be the one to have always in mind; the second approach is more general and depends on the Radon-Nikodym theorem. Existence and uniqueness (almost surely, rather) will be clear in both approaches once the tools used are revealed. All equality between functions in this posting are meant to be almost surely.

  • Geometric approach: Let $(\Omega,\mathscr{F},\mathbb{P})$ a probability space, and let $\mathscr{A}$ a $\sigma$-algebra such that $\mathcal{A}\subset \mathcal{F}$. Consider the collection $\mathcal{H}=\mathcal{L}_2(\mathbb{P})$ that are sure integrable, i.e., $X\in\mathcal{H}$ iff $\mathbb{E}[X^2]<\infty$. Let $\mathcal{H}_\mathcal{A}$ the subspace of $\mathcal{H}$ which consists of $\mathcal{A}$-measurable functions. The space $\mathcal{H}$ is a Hilbert space with the inner product $\langle X, Y\rangle=\mathbb{E}[XY]$. It is not difficult to check that $\mathcal{H}_\mathcal{A}$ is a closed subspace of $\mathcal{H}$. For any $X\in\mathcal{H}$, the orthogonal projection $P_{\mathcal{A}}X$ of $X$ onto $\mathcal{H}_{\mathcal{A}}$, which exists and is unique ($\mathbb{P}$-almost surely), is an $\mathcal{A}$-measurable function such that for any other $Z\in\mathcal{H}_{\mathcal{A}}$, $$\mathbb{E}\Big[\big(X-P_\mathcal{A}(X)\big)^2\Big]\leq \mathbb{E}[(X-Z)^2]$$ and $$\mathbb{E}\big[(X-P_\mathcal{A}X)Z\big]=0$$ The first inequality means that $P_\mathcal{A}X$ is the best approximation to $X$ by elements in $\mathcal{H}_{\mathcal{A}}$. The second identity means that $X-P_\mathcal{A}(X)$ is orthogonal to the space $\mathcal{H}_\mathcal{A}$. The later property implies that for any set $A\in\mathcal{A}$ $$\mathbb{E}[X\mathbb{1}_A]=\mathbb{E}[(X-P_\mathcal{A}(X))\mathbb{1}_A]+\mathbb{E}[P_{\mathcal{A}}(X)\mathbb{1}_A]=\mathbb{E}[P_{\mathcal{A}}(X)\mathbb{1}_A]$$ where the last identity follows from the fact that $\mathbb{1}_A$ is itself an element of $\mathcal{H}_{\mathcal{A}}$. This motivates the following definition, even in the case where $X$ is merely an integrable function and not necessarily squared-integrable.

If $X$ is integrable, that is, $\mathbb{E}[|X|]<\infty$, then the conditional expectation of $X$ given $\mathcal{A}$ is an $\mathcal{A}$-measurable function, denoted as $\mathbb{E}[X|\mathcal{A}]$ such that for any $A\in\mathcal{A}$ \begin{align} \mathbb{E}[X\mathbb{1}_A]=\int_A X\,d\mathbb{P}=\int_A\mathbb{E} [X|\mathcal{A}]\,d\mathbb{P}=\mathbb{E}\Big[\mathbb{E}[X|\mathcal{A}]\mathbb{1}_A\Big]\tag{0}\label{zero} \end{align} When $X$ is also squared integrable, $E[X|\mathcal{A}]$ is the orthogonal projection of $X$ onto $\mathcal{H}_{\mathcal{A}}$.

  • The next approach is more general theoretical and based in the Radon-Nikodym theorem. It also applies to general $\sigma$-finite measure spaces. Suppose $(E,\mathscr{E},\mu)$ is a $\sigma$-finite measure, $(F,\mathscr{F})$ is a measure space and $T:(E,\mathscr{E})\rightarrow(F,\mathscr{F})$ measurable. Let $f\in L_1(\mu)$ and define the (real-valued or complex) measure $\mu^f$ as $\mu^f(A)=\int_A f\,d\mu$. The map $T$ induce a measures $\mu_T$ and $\mu^f_T$ on $(\mathscr{F})$ defined as \begin{align} \mu_T(A)&=\mu(T^{-1}(A))=\int \big(\mathbb{1}_{A}\circ T\big)\,d\mu\tag{1}\label{one}\\ \mu^f_T(A)&=\mu^f(T^{-1}(A))=\int \big(\mathbb{1}_A\circ T\big)\,f\,d\mu\tag{2}\label{two} \end{align} for all $A\in\mathscr{F}$. Notice that $\mu^f_T$ is a finite and absolutely continuous with respect to $\mu_T$: $\mu_T(A)=0=\mu(T^{-1}(A))$ implies that $\mu^f_T(A)=\int_{T^{-1}(A)} f\,d\mu=0$, and $|\mu^f_T(A)|\leq\int_{T^{-1}(A)}|f|\,d\mu\leq\int_E|f|\,d\mu<\infty$.
    If $\mu_T$ is $\sigma$-finite, that is if $E$ can be covered by a sequence of sets $T^{-1}(A_n)$, $A_n\in\mathscr{F}$, with $\mu_T(A_n)<\infty$, then by the Radon-Nikodym theorem, there exists a unique $\mathscr{F}$-measurable function, which we denote by $P_Tf$, such that \begin{align} \mu^f_T(A)=\int_A P_Tf\,d\mu_T,\qquad A\in\mathscr{F}\tag{3}\label{three} \end{align} $P_Tf$ is the conditional expectation of $f$ under $T$. The following special case is the most familiar: Consider the case $E=F$ and $\mathscr{F}\subset\mathscr{E}$. The map $T:x\mapsto x$ is clearly $\mathscr{E}-\mathscr{F}$ measurable, and $\mu_T$ is the restriction of $\mu$ to $\mathscr{F}$. Then \eqref{three} reads $$\int \mathbb{1}_A f\,d\mu=\int\mathbb{1}_AP_Tf\,d\mu$$ When $\mu$ is a probability measure (then all $\sigma$-finiteness requirements for application of Radon-Nikodym are satisfied) we obtain a condition similar to \eqref{zero}. In this case, the description of $P_Tf$ as the conditional expectation of $f$ given $T$ coincides with that for conditional expectation of $f$ given $\mathscr{F}$.

Remarks:

  • The existance of $\mathcal{E}[X|\mathscr{A}]$ when $X$ is integrable but not square integrable can be obtain by approximating $X$ by bounded functions $X_N=\max(-N,\min(X,N))$, $N\in\mathbb{N}$ each of which is square integrable, and then passing to a subsequence where pointwise convergence holds. Uniqueness is easy: If $Y$ and $Z$ were to $\mathcal{A}$-measurable functions such that \eqref{zero} holds, then $A=\{Y<Z\}\in\mathscr{A}$ and $$0=\mathbb{E}\big[(Z-Y)\mathbb{1}_A\big]$$ The since $Z-Y>0$ on $A$, the properties of the integral implies that $\mathbb{P}[A]=0$. Similarly $\mathbb{P}[\{Z<Y\}]=0$ and so, $Z=Y$ ($\mathbb{P}$-a.s.).
  • The operator $P_T$ under a rather similar setting is also known as the Perron-Frobenius transfer operator