why is $E[E[Y|X]] = E[Y]$

4.2k Views Asked by At

I have a derivation from my book, I have a problem with the very first line:

$$ \begin{align} E[E(Y|X)] &= \int_{-\infty}^\infty E(Y|x)f_1(x)dx <- \text{why dx}\\ &= \int_{-\infty}^\infty\int_{-\infty}^\infty yf(y|x)f_1(x)dydx\\ &=\int_{-\infty}^\infty y \int_{-\infty}^\infty f(x,y)dxdy\\ &=\int_{-\infty}^\infty yf_2(y)dy\\ &= E(Y) \end{align} $$

Now everything after the first integral I understand, thats just splitting the integral up and getting marginal/joint densities. But why do we choose $dx$ in the first integral? seems arbitrary, what is it based on?

5

There are 5 best solutions below

1
On BEST ANSWER

$E[Y\mid X]$ is a function of $X$: its value when $X=x$ is $E[Y\mid X=x]$. We now weight each of these values by $f_1(x)$, intuitively the probability that $X=x$, and ‘average’ (i.e., integrate over $x$) to get the expected value of $E[Y\mid X]$, which turns out to be $E[Y]$.

It’s no different in principle from calculating $E[Z]$ when $Z$ is any other function of $X$, e.g., $Z=X^2$.

1
On

What does $E[Y|X]$ represent? It is the expected value of $Y$ based on the value/outcome of $X$. Therefore, it is a function of $X$. Think of it as $f(X) := E[Y|X]$, if it helps.

Then, $E[E[Y|X]] = E[f(X)]$, which hopefully makes it clear why you integrate with $dx$.

0
On

I am assuming that $X$ and $Y$ are r.v.s. In that case, $E(Y|X)$ is a r.v. itself and its realization depends on the value you assign to $X$. In other words, its randomness comes entirely from the randomness of $X$.

Maybe writing $E(Y|X=x)$ let you understand more clearly why you are integrating on $x$.

0
On

This is an instance of the tower law for conditional expectation, a result which can be proven easily with a little measure theory. See the book of David Williams (Probability with Martingales). It says that, given two sigma algebras F and G such that F contains G, it follows that E[ . | G] = E[ E[.|F] |G] Now, since E[. ] is equal to E[ .| trivial sigma algebra], the result you are looking for follows. This result is much more general than the derivation you give in the question: it is a fundamental property of the conditional expectation. By the way, probability theory is built on measure theory: you should learn it if you want to do any serious work with probability.

0
On

A random variable $h(Y):=\mathsf E[X|Y]$ is the best (in a certain sense) approximation of $X$ by random variables of the form $g(Y)$ where $g$ is any "nice" (=measurable) function.

It is the best in a sense that it may happen that $X$ cannot be represented as a function of $Y$, so that the behavior of $X$ and $h(Y)$ can be different in general. However, the behavior of $X$ and $h(Y)$ on the level sets of $Y$ is the same, which is expressed in the formula $$ \mathsf E[X1_A] = \mathsf E[h(Y)1_A]\text{ for all }A\in \sigma(Y). \tag{1} $$ Now, clearly the whole set $\Omega$ is among the level sets of $Y$, so a special case of $(1)$ is $$ \mathsf E[X] = \mathsf E[h(Y)] = \mathsf E[\mathsf E[X|Y]]. $$