Exercise 2.8 of Pattern Recognition and Machine Learning by C. M. Bishop

569 Views Asked by At

I am trying to solve exercise 2.8 of the book "Pattern Recognition and Machine Learning by C. M. Bishop, but I get stuck. The question is as follows.

Consider two variables $x$ and $y$ with joint distribution $p(x,y)$. Prove the following two results \begin{align} \text{E}[x] &= \text{E}_y[\text{E}_x[x|y]] \tag{1}\label{1}\\ \text{var}[x] &= \text{E}_y[\text{var}_x[x|y]] + \text{var}_y[\text{E}_x[x|y]]. \tag{2}\label{2} \end{align} Here $\text{E}_x[x|y]$ denotes the expectation of $x$ under the conditional distribution $p(x|y)$, with a similar notation for the conditional variance.

I think I succeeded to prove \eqref{1}. Since one can write $\text{E}_x[x|y]=\int x p(x|y) \text{d}x$ and $\text{E}_y[x]=\int x p(y) \text{d}y$ (note that all integrals are to evaluated from $-\infty$ till $\infty$, but I omit the boundaries to keep notation clear), we get \begin{align} \text{E}_y[\text{E}_x[x|y]] &= \int \int x p(x|y) \text{d}x \, p(y) \text{d}y \\ & = \int \int x p(x|y) p(y) \text{d}x \text{d}y \\ & = \int \int x p(x,y) \text{d}x \text{d}y \\ & = \text{E}[x]. \end{align}

For the second part, I started with writing \begin{align} \text{E}_y[\text{var}_x[x|y]] &= \int \int (x - \text{E}_x[x|y])^2 p(x|y) \text{d}x \, p(y) \text{d}y, \\ \text{var}_y[\text{E}_x[x|y]] &= \int (\text{E}_x[x|y]-\text{E}_y[\text{E}_x[x|y]])^2 p(y) \text{d}y. \end{align} Use can be made of the first part (i.e., $\text{E}[x] = \text{E}_y[\text{E}_x[x|y]]$), but I have no idea how the sum of $\text{E}_y[\text{var}_x[x|y]]$ and $\text{var}_y[\text{E}_x[x|y]]$ has to lead to $\text{var}[x]$. Can someone helping me proving \eqref{2}?

1

There are 1 best solutions below

0
On BEST ANSWER

Because of ClementC's comment, I can answer the question myself :). Starting from the right hand side of $(2)$, we get:

$$ \text{E}_y[\text{var}_x[x|y]]+\text{var}_y[\text{E}_x[x|y]] = \text{E}_y[\text{E}_x[(x-\text{E}_x[x|y])^2|y]+\text{E}_y[(\text{E}_x[x|y]-\text{E}_y[\text{E}_x[x|y]])^2]. $$

Now using the fact that $\text{E}[(x-\text{E}[x])^2]=\text{E}[x^2]-(\text{E}[x])^2$ gives:

$$ \text{E}_y[\text{E}_x[x^2|y]]-\text{E}_y[(\text{E}_x[x|y])^2]+\text{E}_y[(\text{E}_x[x|y])^2]-(\text{E}_y[\text{E}_x[x|y]])^2. \tag{3}\label{3} $$

The first term can be simplified to $E[x^2]$ by using exactly the same reasoning as is done for proving the first equation (see question). The second and third term cancel each other out. The fourth term can be simplified to $(E[x])^2$. As a result, \eqref{3} equals $$ \text{E}[x^2] - (\text{E}[x])^2 = \text{var}[x] $$ which proves equation (2) of the question.