Why do I have $\Bbb{E}\left(|\Bbb{E}(X|Y)|\right)=\sum_{y\in E'} \Bbb{E}\left(|\Bbb{E}(X|Y)|\Bbb{1}_{Y=y}\right)$?

58 Views Asked by At

I have a question about the conditional expectation of $X$ given $Y$. Let $Y$ be a discrete RV with values in $E$ then we define $E'=\{y\in E:\Bbb{P}(Y=y)>0\}$. There we can compute $\forall X\in L^1(\Bbb{P})$ and $\forall y\in E'$ $$\Bbb{E}(X|Y=y)=\frac{\Bbb{E}(X\Bbb{1}_{Y=y})}{\Bbb{P}(Y=y)}=:\phi(y)$$ we define $$\Bbb{E}(X|Y)=\phi\circ Y=:\phi(Y)$$

My question is now, why do I know that $$\Bbb{E}(|\Bbb{E}(X|Y)|)=\sum_{y\in E'} \Bbb{E}\left(|\Bbb{E}(X|Y)|\Bbb{1}_{Y=y}\right)$$

I mean I know that $Y$ is discrete so it means that it takes value in a countable set, therefore the sum makes sense to me. In addition I somehow see that we want to write it as a partition therefore we have the right hand side. So intuitivly I see why this should be true. But I think I miss something in the mathematical argument. So could maybe someone explicitly explain me what's going on in this equation, so why can I do this partition like this.

Thanks for your help.

1

There are 1 best solutions below

0
On BEST ANSWER

I think at times the notation for conditional expectations and nested expectations can be a bit confusing so I'll write it out as explicitly as possible.

The first thing to note is that $\mathbb{E}(X|Y=y)$ is just a constant which depends on $y$. So if we make $y$ a random variable, let's call it $Y'$ then we're really looking at the random variable $$\mathbb{E}(X|Y = Y')$$ (this is random because $Y'$ is random). This has distribution* $$\mathbb{P}\{\mathbb{E}(X|Y = Y') = \mathbb{E}(X|Y=y)\} = \mathbb{P}(Y'=y) = \mathbb{P}(Y=y)$$ and its a notational short-hand to write this random variable as simply $\mathbb{E}(X|Y)$

Okay so now that we have an understanding of what $\mathbb{E}(X|Y)$ really is as a random variable we can start talking about its expectation.

\begin{align} \mathbb{E}(|\mathbb{E}(X|Y)|) &= \mathbb{E}_{Y'}(|\mathbb{E}_X(X|Y=Y')|) & (\text{notation}) \\ &= \sum_{y \in E}p(Y'=y)|\mathbb{E}_X(X|Y=y)| & (\text{dist. of } \mathbb{E}(X|Y=Y')) \\ &= \sum_{y \in E} \mathbb{E}_{Y'}(\mathbb{1}[Y'=y])|\mathbb{E}_X(X|Y=y)| & (\text{See Fact 1}) \\ &= \sum_{y \in E} \mathbb{E}_{Y'}(|\mathbb{E}_X(X|Y=y)|\mathbb{1}[Y'=y]) & (|\mathbb{E}_X(X|Y=y)| \text{ is a constant}) \\ &= \sum_{y \in E} \mathbb{E}_{Y'}(|\mathbb{E}_X(X|Y=Y')|\mathbb{1}[Y'=y]) & (0 \text{ when } Y' \neq y) \\ &= \sum_{y \in E} \mathbb{E}_{Y}(|\mathbb{E}_X(X|Y)|\mathbb{1}[Y=y]) & (\text{notation}) \\ \end{align}

The fact that I skipped above is that for every random variable $Z$ we have \begin{equation} \mathbb{E}\mathbb{1}[Z = c] = 1\mathbb{P}[Z = c] + 0\mathbb{P}[Z \neq c] = \mathbb{P}[Z=c] \end{equation} This shows that the probability of an event is the same as the expected value of the indicator of that event.

* For ease of exposition I'm ignoring the corner case that there may be multiple values $y_1,y_2,...,y_k$ such that $\mathbb{E}(X|Y=y_1) = \mathbb{E}(X|Y=y_2) = ... = \mathbb{E}(X|Y=y_k)$, really the probability should be \begin{align} \mathbb{P}\{\mathbb{E}(X|Y = Y') = c\} = \sum_{y \in E} \mathbb{P}\{Y = y\} \mathbb{1}[\mathbb{E}(X|Y=y) = c] \end{align}
basically, summing over the values of $y$ which have the correct conditional expectation and adding up their probabilities.