What is the difference between $E[X\mid Y]$ vs $E[X\mid Y=y]$ and some of the properties of $E[X \mid Y]$?

1.2k Views Asked by At

I was trying to understand both intuitively and rigorously what the difference between $E[X\mid Y]$ vs $E[X\mid Y=y]$.

Let me tell you first the things that do make sense to me. $E[X\mid Y=y]$ makes sense to me (I think). For me it means the value that we expect $X$ to have on average given that the event $Y=y$ was observed and it has deterministic value:

$$E[X\mid Y=y] = \sum_x{xp_{X\mid Y}(x\mid y)} = \mu_{X\mid Y=y}$$

that can be computed like any other expectation (where the notation $\mu_{X\mid Y=y}$ denotes the actual real number that $E[X\mid Y=y]$ takes). $E[X\mid Y=y]$ is a function of y. Given $Y=y$, the conditional expectation will always be $\mu_{X\mid Y=y}$, which is governed by what specific y was observed.

However, $E[X\mid Y]$ makes less sense to me. I have read something similar to the following:

Notice now that Y is a random variable this time. Therefore, $E[X\mid Y]$ is a random variable.

That sort of makes sense to me because, if $Y$ is random then the expected value of $X$ has to be random too. In other words, $Y$ being random, then consequently $E[X\mid Y]$ is random too. The statement $E[X|Y]$ is a r.v. and is the exact statement I would like to understand more precisely. I feel I understand less and I would like to address it.

I feel if I really understood this concept of what $E[X\mid Y]$ really means, I should be able to answer the following questions:

1) If $E[X\mid Y]$ is random, then, what are the possible values it can take? Can it only take the values $\mu_{X\mid Y=y}$ for $y \in Y$? Say if $Y=\{1,2,3\}$ and $U_{X,Y} = \{\mu_{X\mid Y=1}= 11, \mu_{X\mid Y=2} = 22, \mu_{X\mid Y=3} = 33 \}$. Then is there any chance that $E[X\mid Y] = 123$?

2) Is the distribution of $E[X|Y]$ over $U_{X,Y}$ or over $Y$?

3) What is the probability distribution of $E[X\mid Y]$ if we have all the information we need about the distributions of $p_X(x)$ and $p_Y(y)$? Is it just the same as $p_Y(y)$? Is there a closed/specific formula for it?

4) When one is asked to find the distribution of $E[X\mid Y]$ are we asked to find $Pr[E[X\mid Y=y]]$ or $Pr[E[X\mid Y] = \mu]$? Is there a difference between the two? Is one nonsense while the other one is a valid probability distribution?

5) If we were to sketch the probability density function for $E[X \mid Y]$, would the horizontal axis be $y$ or $\mu$ ? i.e. would the probability density be a function of $y$ or of $E[X\mid Y] = \mu$? i.e. would $p_{E[X\mid Y]}(k)$ be a function of $y=Y$ or $E[X\mid Y]$?

6) Related to the above two question, it seems to me that writing an explicit formula for $E[X\mid Y=y]$ is easy, while for $E[X\mid Y]$ it is not (or at least for me). Is the formula $E[X\mid Y] = \sum_x{xp_{X\mid Y}(x\mid Y)}$? I would guess it is but for me its a very strange equation because we are conditioning on a random variable, or we are saying given $Y$, but $Y$ is random so its really not given. Therefore, I can't seem to find an expression for it that makes sense to me.

Basically, it is not clear to me what $E[X\mid Y]$ means, because I don't know what its valid outcomes are, what its distribution is (in relation to $p_X(x), p_Y(y)$ or $p_{X,Y}(x,y)$ or anything) nor can I write an explicit formula for it that makes sense to me. I can't even decide if $p_{E[X \mid Y]}(k)$ is a function of y or $\mu=E[X|Y]$.

5

There are 5 best solutions below

4
On BEST ANSWER

If $\varphi(y)=E[X\mid Y=y\,]$ is the non-random function of $y$, then $E[X\mid Y\,]$ is defined to be the random variable $\varphi(Y)$.

0
On

If $Z=E[X\mid Y]$ a random variable then it means $F_{Z}(z)=P(Z\leq z)=P(Y\in\{y\in\mathbb{R}:E[X\mid Y=y]\leq z\})$.

If you want to think intuitively, it is very simple: to choose a random value for $Z$, first choose $Y$ randomly according to the probability of $Y$, you obtain $Y=y$. Then the value $Z$ takes is $E[X\mid Y=y]$.

In fact, this is just a special case of the general construction: if $f$ is a real Borel measurable function, then $Z=f(Y)$ is defined to be a random variable such that $F_{Z}(z)=P(Z\leq z)=P(Y\in f^{-1}(-\infty,z])$. Here $E[X\mid Y=y]$ is a real function that take $y$ and return $E[X\mid Y=y]$.

The reason for the intuitive notion to be slightly complicated is because of the possibility of the function being not injective. For example, let's suppose $X$ and $Y$ is independent. Then $Z$ would be a value that take on the value of $E[X]$ with probability $1$, regardless of how $Y$ are distributed: this is because $E[X\mid Y=y]$ is a constant function.

2
On

There are fundamental differences between $E[X\mid Y=y]$ and $E[X\mid Y]$. You've got the fist one down, in that we are conditioning on a specific event. In the latter case, you are merely told the likelihood of various events, not which ones actually happen.

Calculating the distribution of $E[X\mid Y]$ is exactly the same as calculating the distribution of $f(X)$ for some integrable function $f$ given that you know the distribution of $Y$, since $E[X\mid Y]$ is just a special case of a function of $Y$.

Bottom line: Don't think of it as an expectation, think of it as just the distribution of the function of a random variable.

0
On

You have been given good answers to your main question, so I think some technical remarks may also be of some value to you. Let's say you model some random phenomena on a probability space $(\Omega,\mathscr F,\mathsf P)$ and you have random variables $X:(\Omega,\mathscr F) \to (A,\mathscr A)$ and $Y:(\Omega,\mathscr F)\to (B,\mathscr B)$. Then $\xi = \mathsf E[X|Y]$ is also a random variable, that is a map $\xi:(\Omega,Y^{-1}(\mathscr B)) \to (A,\mathscr A)$ whereas $\eta(y) = \mathsf E[X|Y =y]$ is a mapping $\eta:(B,\mathscr B) \to (A,\mathscr A)$.

You can think of $\xi = \eta(Y)$, so in fact $\eta$ may contain more information. That is, let $Y$ tell us whether the train arrives in the morning $B = \{m,e\}$, and $X$ be the delay of the train. You can specify $\eta$ as an expected delay given the period of arrival, e.g. $\eta(m) = 5$ and $\eta(e) = 2$ as in the morning the situation on the rail road is more uncertain. Now, if you have a random period of arrival $Y$ which is almost surely evening, then by computing $\xi$ you will observe that $\mathsf E[X|e] = 2$, but you'll have no clue what $\mathsf E[X|m]$ shall be.

It is also true that $\xi$ does always exist, whereas for the existence of $\eta$ you may need to make some assumptions on $A$, which are nevertheless satisfied in case $A = \Bbb R$ you are interested in.

0
On

The easiest way is to write it as follows:

$$f(y) = E[X \mid Y = y] = \sum_{x \in \mathcal X} p_{X \mid Y}(x \mid y) x $$

$y$ is simply just a value. $f$ is a function that can be evaluated. It's usually an integral or a sum usually. Given a value of $y$ (any that you want ) it produces a real number. It is simply a deterministic function. It is parametrized by the distributions of random variables $X,Y$ but there is no sampling going on. Once we know the value of $y$ we want we can compute the conditional expectation in question.