Conditional Expectation for Random Variables

487 Views Asked by At

How do you prove that $E(Y|X) = E(Y|E(Y|X))$ for any random variables $X,Y$? I tried using properties of conditional expectation like $E(E(Y|X)) = E(Y)$, but I'm not making much progress.

Is there an intuitive explanation for why this is true?

1

There are 1 best solutions below

9
On

Intuitively, you can think of $E(Y\mid X)$ as "our best estimate of $Y$'s value, given $X$" and it's pretty clear that "our best estimate for $Y$'s value, given our best estimate of $Y$'s value given $X$" should be the same as that estimate.

Concrete example:

  1. For the left-hand side, you are a meteorologist. You are asked: "Given that today it's $X = 5^\circ$ outside, what is your prediction for tomorrow's temperature ($Y$)?" And maybe you answer $7^\circ$. (In general, $E(Y\mid X)$ is a function that maps today's temperature $X$ to a prediction of tomorrow's.)
  2. For the right-hand side, you are a meteorologist. You are told: "Five minutes ago, I told you the temperature outside, and asked you to predict tomorrow's temperature. You answered $7^\circ$. Then I wiped your memory. Given only what you answered then, what is your prediction for tomorrow's temperature?" You should obviously (assuming I'm honest) answer $7^\circ$.

For a formal proof, take Wikipedia's definition of conditional expectation.

The characterizing property of $E(Y\mid X)$ is that it's a whatever-measurable function of $X$ satisfying, for any measurable $f$, $$ \int E(Y\mid X) f(X)\,\text{d}P = \int Y f(X)\,\text{d}P. $$ In particular, this implies that for any measurable function $f$, $$ \int E(Y\mid X) f(E(Y\mid X))\,\text{d}P = \int Y f(E(Y\mid X))\,\text{d}P $$ because $f(E(Y\mid X))$ is also measurable function of $X$. So $E(Y\mid X)$ satisfies the characterizing property of $E(Y\mid E(Y \mid X))$, so it agrees with $E(Y\mid E(Y \mid X))$ almost always.


One "high-level" proof of this fact is as follows. The "tower property" of conditional expectation says that in general $E(E(A\mid B)\mid C) = E(A \mid C)$ when $C$ is $B$-measurable (intuitively, when knowing the value of $B$ tells you $C$).

In particular, $E(Y\mid X)$ is $X$-measurable. So we have $$ E(E(Y\mid X) \mid E(Y\mid X)) = E(Y \mid E(Y\mid X)) $$ by applying the tower property with $A = Y$, $B=X$, $C = E(Y\mid X)$. But $E(A\mid A)=A$ for any $A$, so $E(E(Y\mid X) \mid E(Y\mid X))$ simplifies to $E(Y\mid X)$, and we conclude that $E(Y\mid X) = E(Y \mid E(Y\mid X))$ as desired.