Definition: For a random variables $X\in\mathbb R^{d_1}$ and $Y\in\mathbb R^{d_2}$, we define a conditional expectation of $X$ given $Y$ by any random variable $Z$ satisfying:
- there exists $g:\mathbb R^{d_2}\rightarrow\mathbb R^{d_1}$ such that $Z=g(Y)$ and
- $\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ for all $A\subseteq \mathbb R^{d_2}$
To be honest I don't understand the definition. Like
- the reason for requiring $\mathbb E[X|Y]$ to be a function of $Y$
- Why $\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ needed for all $A\subseteq \mathbb R^{d_2}?$
Here is one example they mentioned:
$\Omega=[-1,1]$ and $\mathbb P$ is uniform distribution. Define $$\begin{align}X(\omega)&=-\frac12+\unicode{x1D7D9}_{\{\omega\in[-1,-1/2]\cup[0,1/2]\}}+2\unicode{x1D7D9}_{\{\omega\in[-1/2,0]\}}\\Y(\omega)&=\unicode{x1D7D9}_{\{\omega\geq0\}}\\Z(\omega)&=1-Y(\omega)\end{align}$$ Then $\mathbb E[X|Y]=Z$ and $\mathbb P(X=Z)=0$
I didn't get how to compute conditional expectation using the above definition.
Here is another definition from A First Look at Rigorous Probability Theory, by Jeffrey S. Rosenthal
Definition: If $Y$ is a random variable, and if we define $v $ by $v(S)=\mathbb P(Y\in S|B)=\mathbb P(Y\in S,B)/P(B)$, then $v=\mathcal L(Y|B)$ is a probability measure, called the conditional distribution of $Y$ given $B$. $\mathcal L(Y\unicode{x1D7D9}_{B})=\mathbb P(B)\mathcal L(Y|B)+P(B^c)\delta_0$, so taking expectations and re-arranging, $$\mathbb E(Y|B)=\mathbb E(Y\unicode{x1D7D9}_{B})/\mathbb P(B)$$
Here also I can't understand the role of $v$ and how it creates similar thing with above definition.
The problem with using $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$ is that $\mathbb{P}(Y=y)$ may be $0$ for all $y$, for example if $Y$ is a normally distributed random variable.
We require $\mathbb{E}[X|Y]$ to be a function of $Y$ because we want to capture the idea that knowing $Y$ should be enough to compute $\mathbb{E}[X|Y]$, i.e. $\mathbb{E}[X|Y]$ depends only on the value of $Y$.
The condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ for all $A \subset \mathbb{R}^{d_2}$ (typically the definition is that $A$ is a Borel measurable subset, but that's not too important here) is sort of the generalization of $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$. If we had that $\mathbb{P}(Y=y) > 0$, then we could set $A = \{y\}$ so that $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = g(y) \mathbb{P}(Y=y)$ and the condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ would become \begin{align}g(y)\mathbb{P}(Y=y) &= \mathbb{E}[X\unicode{x1D7D9}_{Y =y}] \notag \\ g(y) &=\frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =y}]}{\mathbb{P}(Y=y)},\end{align} so $\mathbb E[X|Y=y]$ would be defined the way you suggested. This property agrees with your definition when $\mathbb{P}(Y=y) > 0$, but still works for continuous random variables where $\mathbb{P}(Y=y) = 0$ for all $y$.
For the example given in the post, we have that $\mathbb{P}(Y = 1) = \mathbb{P}(Y=0) = \frac 12$, so we only need to find $g(0)$ and $g(1)$. Using the above equation for $g(y)$, we compute \begin{align*} g(0) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =0}]}{\mathbb{P}(Y=0)} = 2 \int_{-1}^0 X(\omega) d \mathbb{P}(\omega) = \int_{-1}^0 X(\omega)d\omega = 1 \\ g(1) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =1}]}{\mathbb{P}(Y=1)} = 2 \int_{0}^1 X(\omega) d \mathbb{P}(\omega) = \int_{0}^1 X(\omega) d\omega = 0, \end{align*} so $$\mathbb{E}[X|Y] = g(Y) = 1-Y.$$