Proof of Total Variance in Statistical Inference

85 Views Asked by At

In book Statistical Inference by Casella, the theorem 4.4.7 (Conditional variance identity), author proves one expectation to be 0 (shown in green rect).

However, I followed the same idea but got different result. I want to know what's wrong with my proof (below).

Thank you!

My derivation

Consider \begin{align*} E_X\big( \big[ X-E[X|Y] \big] \big[ E[X|Y]-EX \big] \big) \end{align*}

Note that the expectation is calculated under the distribution of $X$. Also note that $E[X|Y] = g(Y)$ and $EX=\text{constant}$. So we have \begin{align*} E_X\big( \big[ X-E[X|Y] \big] \big[ E[X|Y]-EX \big] \big) &= \big[ E[X|Y]-EX \big] E_X \big[ X-E[X|Y] \big] \\ &= \big[ E[X|Y]-EX \big] \big[ EX-E[X|Y] \big] \\ &= -\big[ E[X|Y]-EX \big]^2 \\ \end{align*}

I took $\big[ E[X|Y]-EX \big]$ out because it's a function of $Y$:

$$E_X[h(X) g(Y)] = \int_x [h(X) g(Y)] f_X(x) dx = g(Y) \int_x h(X) f_X(x) dx = g(Y) E_X[h(X)]$$


Statistical Inference by Casella

Theorem 4.4.7 (Conditional variance identity) in Statistical Inference

1

There are 1 best solutions below

0
On

I am thinking this.

Step 1

Consider $\text{Var} X$, the author writes:

\begin{align*} \text{Var} X = E \big( \big[ X - EX \big]^2 \big) = E \big( \big[ X - E(X|Y) + E(X|Y) - EX \big]^2 \big) \end{align*}

The first $E$ is $E_X$. The second $E$ is not $E_X$, but $E_{X, Y}$.

Step 2

When we say $EX$, we $E_X X$. However, when we say $E[X-E[X|Y]]$, we actually mean $E_{X, Y}[X-E[X|Y]]$.

To see this, note that $E[X|Y]$ is $g(Y)$, and $X-g(Y)$ is $W$, which is just another random variable. Taking expectation of $W$, we should expect a number, instead of a function of $Y$. That is, there should not be any randomness.

Think about it in this way. Before, you somehow know X's distribution. So you can calculate $EX$. However, later you realize that X actually depends on Y, and you have a new model/understanding on X, which is: when Y=$y_1$, X has one distribution, and when Y=$y_2$, X has one distribution and so on. Then you can also calculate $EX$, but this time, $EX = E_Y E_{X|Y}[X|Y]$.

Maybe the following notation is clearer:

\begin{align*} E[X-E[X|Y]] &= E[X-g(Y)] \\ &= E_{X, Y}[X-g(Y)] \\ &= \int_x \int_y (x-g(y)) f_{X, Y}(x, y) dy dx \\ \end{align*}

You can see above we should really consider the joint distribution.

Now,

\begin{align*} E_{X, Y}[X-g(Y)] &= E_Y E_{(X, Y)|Y}[X-g(Y) | Y] \\ &= E_Y E_{X|Y}[X-g(Y) | Y] \\ &= E_Y E_{X|Y}[X-E[X|Y] | Y] \\ &= E_Y \big[ E_{X|Y}[X|Y]- E[X|Y] \big] \\ &= 0 \end{align*}