L2 Norm of Expectation

3.5k Views Asked by At

This question may be more suited for the statistics sister site, but I thought I would start here.

I'm posed with proving the following statement:

Let $X,Y$ be random variables on some probability space, and let $Z=E[X\mid Y]$ prove $$||X||_{L_2}^2 = ||X-Z||_{L_2}^2+||Z||_{L_2}^2$$

I'm immediately a little stuck. How can I take the L2 norm of expectation? Once I figure that out I'm sure this follows from linearity/iterated expectation but I'm not sure how I should be dealing with the expectation inside the norm? Am I missing some way to be arrange it nicely? I am also suspicious that there is some super simple algebra I am missing to show this too.

Any pointers would be appreciated, surely I am missing something obvious.

2

There are 2 best solutions below

2
On BEST ANSWER

Big hint: $L^2$ is an inner product space. The inner product is just what you’d expect: $$\langle X, Y \rangle := \int XY dP = \Bbb{E}[XY].$$ So rewrite the LHS as $$||X||^2 = \langle X, X \rangle = \langle (X - Z) + Z, (X - Z) + Z \rangle,$$ and use properties of inner products (and, of course, conditional/iterated expectations) to make it look like the RHS.

Edit: $Z = \Bbb{E}[X | Y]$ may be an "expectation", but it need not be constant.
Concrete example: I enter a casino with a thousand dollars and stake a hundred dollars on a sequence of three fair bets (i.e. I win or lose a hundred dollars each bet with 50-50 probability).
Let $X$ be my final net worth, and $Y$ be the indicator r.v. of me winning the first bet in the sequence (i.e. $Y = 1$ if I win the first bet and $Y = 0$ otherwise). Then it's not hard to show that $$Z = \Bbb{E}[X | Y] = 900 + 200Y,$$ since $$\Bbb{E}[X | Y = 1] = \sum x \Bbb{P}[X = x |Y = 1] = 1100$$ and $$\Bbb{E}[X | Y = 0] = \sum x \Bbb{P}[X = x |Y = 0] = 900.$$ $Z$ is clearly a coarser random variable than $X$, that nonetheless represents $X$ to the greatest degree possible given the limited information afforded by $Y$.

From personal experience, I find that conditional expectation is really hard to understand if your first exposure to it is the super abstract, measure theoretic setting (like when my graduate probability course was working out of Durrett, which has many things to recommend it but is pitched to the level of someone with lots of preexisting familiarity with the basics).
It's probably better to start with conditional expectation for discrete r.v.'s and then look at the continuous case after that. Here are a couple sources that do that nicely. I also TA'd for an undergrad probability class that worked out of Bertsekas and Tsitsiklis' book, which I would recommend.
Wikipedia has a long list of relevant properties of conditional expectation as well, and also talks about its interpretation as the orthogonal projection of $X$ into the subspace of $L^2$ spanned by all measurable functions of $Y$ (which is the substance of what this exercise asks you to prove.)

4
On

My attempt at a proof:

\begin{align} ||X-Z||_2^2 + ||Z||_2^2 &= E[(X-Z)^2]+E[Z^2]\\ &=E[X^2-2XZ+Z^2]+E[Z^2]\\ &=E[X^2]-2E[XZ]+2E[Z^2]\\ &=E[X^2]-2ZE[X]+2ZE[Z] \text{ ******}\\ &=E[X^2]-2E[X|Y]E[X]+2E[X|Y]E[E[X|Y]]\\ &=E[X^2]-2E[X|Y]E[X]+2E[X|Y]E[X]\\ &=E[X^2]\\ &=||X||_2^2 \end{align}

The last term in the line with the asterisks is what I am concerned about, am I able to pull one Z out and leave the other (since Z is constant... right)?