Basic questions about $E[E[X \mid Y]]$

180 Views Asked by At

Let $X$ and $Y$ be discrete random variables with real values $\{x_1, \dots, x_n\}$, $\{y_1, \dots, y_m\}$ respectively.

The questions below arose from considering the expression $E[E[X \mid Y]]$ and asking whether it it well-defined and unambiguous.

In my mind, we can make sense of $E[X = x_i \mid Y]$; it is simply $\sum_{j=1}^m y_jP(X = x_i \mid Y = y_j)$. Then $E[E[X \mid Y]] = \sum_{i=1}^nx_iE[X = x_i \mid Y]$.

$1$. Can we view this as the expectation of the discrete random variable $X \mid Y$ which has values $\{(x_i, y_j) \mid i = 1, \dots, n, j = 1, \dots, m\}$? Or is $X \mid Y$ not a random variable?

Approaching the expression $E[E[X \mid Y]]$ in a different way, I want to swap the order in which we take the expectations.

$2$. Is $E[X \mid Y = y_j] = \sum_{i=1}^n x_i P(X = x_i \mid Y = y_j)$ a common thing to consider? If not, why not?

Then we would have $E[E[X \mid Y]] = \sum_{j=1}^m y_j E[X \mid Y = y_j]$. Note, this agrees with the other calculation of $E[E[X \mid Y]]$.

In trying to indicate the order of the variables I was taking expectations of, I wrote some expressions which I'm not sure are defined.

$3$. Does it make sense to write $E[X \mid E[Y]]$ or $E[E[X] \mid Y]$? If so, is there any relation between these things and $E[X \mid E[Y]]$?

2

There are 2 best solutions below

6
On BEST ANSWER

firstly, $E[X=x_i|Y]$ makes no sense

  1. $X|Y$ is not an object in the rigorous probability theory sense, but I don't understand what your question means. For discrete random variables (i.e. X and Y only take countably many values), $E[X|Y] = \sum_{i=1}^\infty E[X|Y=y_i]1_{\{Y=y_i\}}$, so $E[X|Y]$ is a random variable.

maybe by $X|Y$ you mean the distribution of $X$ conditional on $Y$? I don't know what the proper notation for this.

  1. This is correct

  2. $E[X]$ and $E[Y]$ are numbers. $E[E[X]|Y]=E[X]$, $E[Y|E[X]]=E[Y]$ What I said was conditioning on $E[Y]$ makes no sense, but thanks to Did and Byron, what I should have said was : conditioning a constant is not particularly interesting or tell you anything

I would like to point out at last that conditional expectations, in the probability theory sense, i.e. $X|Y$, are random variables. These are effectively the 'best prediction' of $X$ when given $Y$.

$E[X=x_i|Y]$ - what should this mean? the expectation of $X=x_i$ conditional on everything you know about $Y$?! The thing before | should be a random variable, but $\{X=x_i\}$ an event. $P(X=x_i|Y)$ or $P(X=x_i|Y=y_j)$ makes sense.

3
On

What you have written in (2) is correct: \[ E[X \mid Y = y_j] = \sum_{i=1}^n x_i P(X = x_i \mid Y = y_j). \] It then follows that $E[X \mid Y = y_j]$ is a function with domain $\{y_1,\ldots,y_m\}$. Let us call this function $h$, so that \[ h(y) = E[X \mid Y = y]. \] The object $E[X\mid Y]$ is then defined as $h(Y)$. Note, therefore, that $E[X\mid Y]$ is a random variable. (It is a function of $Y$.) Since it is a random variable, it has an expected value, so it makes sense to talk about $E[E[X\mid Y]]=E[h(Y)]$. We then get \begin{align*} E[E[X\mid Y]] = E[h(Y)] &= \sum_{j=1}^m h(y_j)P(Y = y_j)\\ &= \sum_{j=1}^m\sum_{i=1}^n x_i P(X = x_i \mid Y = y_j)P(Y = y_j)\\ &= \sum_{i=1}^n x_i\sum_{j=1}^m P(X = x_i, Y = y_j)\\ &= \sum_{i=1}^n x_i P(X = x_i) = E[X]. \end{align*} What we have proven here is the general fact that $E[E[X\mid Y]]=E[X]$. I have some more information about conditional expectations of this kind in a set of notes on my web site.