Law of total expectation?

2.8k Views Asked by At

Apparently $E[X] = E[E[X\mid Y]]$ but I don't understand what this really means. I looked at https://en.wikipedia.org/wiki/Law_of_total_expectation but need another explanation.

Isn't this the same as $E[X] = E[X\mid Y]$? Why the extra $E[ \cdot ]$? And ultimately why does the $Y$ not seem to even matter if we say that $X$ depends on $Y$?

3

There are 3 best solutions below

0
On BEST ANSWER

$\newcommand{\E}{\operatorname{E}}$The random variable $X$ has a conditional probability distribution given the event $Y=y$, for each value $y$ that the random variable $Y$ can take. Hence it also has a conditional expected value $\E(X\mid Y=y)$. This conditional expected value of course depends on $y$; thus we can write $\E(X\mid Y=y) = g(y)$.

The $g(Y)$ is a random variable, and we denote it $\E(X\mid Y)$.

As a concrete example, suppose five red marbles and three green marbles are in an urn, and you draw two of them without replacement. Let $Y$ be the number of red marbles on the first draw (either $0$ or $1$) and $X$ on the second. Then $$ Y = \begin{cases} 0 & \text{with probability } \dfrac 3 8, \\[6pt] 1 & \text{with probability } \dfrac 5 8. \end{cases} $$

\begin{align} \E(X\mid Y=0) & = \frac 5 7. \\[10pt] \E(X\mid Y=1) & = \frac 4 7. \end{align} Therefore $$ \E(X\mid Y) = \begin{cases} \dfrac 5 7 & \text{with probability } \dfrac 3 8, \\[6pt] \dfrac 4 7 & \text{with probability } \dfrac 5 8. \end{cases} $$ That's what $\E(X\mid Y)$ means. And with that probabilty distribution of the random variable $\E(X\mid Y)$, you can find $\E(\E(X\mid Y))$.

0
On

A concrete example:

Suppose $X$ can take the values $0$ or $1$. Suppose $Y$ can take the values $a$ or $b$ each with probability $1/2$. Suppose that if $Y=a$ then the probability $X=0$ is $1$ while if $Y=b$ then the probability $X=0$ is $0$. Then $E[X\mid Y=a]=0$ while $E[X\mid Y=b]=1$. Then $$E[X]=E[E[X\mid Y]]=\frac{1}{2}E[X\mid Y=a]+\frac{1}{2}E[X\mid Y=b]=\frac{1}{2}0+\frac{1}{2}1=\frac{1}{2}.$$

You can see that $Y$ "disappears" because when you take the expectation of $E[X\mid Y]$ you are just weighting the conditional expectations of $X$ given each $Y$ by the probability of that value of $Y$.

1
On

Consider a pair of variates $(X,Y)$ variate defined by the following process:

Roll one six-sided die; call the result $Y$. Then roll $X$ six-sided dice and call the sum of the pips $X$.

Now $E[X\mid Y]$ will be given by $\frac72 Y$; for example, if the $Y$ roll was a $2$, then by rolling two dice you get an expected value of $7$ for $X$ -- this is $E[X\mid Y=2]$. If $Y$ is something else, $k$, then $E[X\mid Y=k]$ will be different.

How would we calculate the overall $E[X]$? Well, $\frac16$ of the time $Y$ will be $1$ and then $E[X]$ would be $\frac72$ for those cases. And $\frac16$ of the time $Y$ will be $2$ and then $E[X]$ would be $7$ for those cases. and so forth. Adding those up, what you have done is taken the expectation (over possible values of $Y$ of the expectation of $X$ given each value of $Y$. And that is what your statement says.

BTW, the answer for this simple problem is $\frac{49}{4}$