Why is $x_1$ independent of $x_2$ for a joint gaussian with $0$ correlation

37 Views Asked by At

I am reading Murphys machine learning: a probabilistic perspective and in 4.3.2.1 it says that for a joint gaussian with 2 zero mean random variables, lets say $y$ and $x$, if the correlation is $0$, they are independent. It then goes on to say that if you know a $y$ value, the conditional pdf of $x$ given $y$ has the same variance of the marginal pdf of $x$. This implies that learning information about $y$ tells us nothing about $x$. However, when I draw a joint gaussian and take a slice for some fixed $y$ value, it appears the distribution will get tighter for $x$ even if there is 0 correlation. Why is it the case that our uncertainty of $x$ does not decrease after learning something about $y$? Additionally, why would $x$ and $y$ be independent. One potential answer is that after we normalize the slice of the gaussian, the conditional pdf of $x$ given $y$ will look just like the marginal pdf of $x$.

1

There are 1 best solutions below

0
On BEST ANSWER

For a random vector $X=(X_1,\ldots,X_n)$, by definition, it is a joint gaussian $\mathcal{N}(\mu,\Sigma)$, where $\mu$ is the mean vector and $\Sigma$ is the covariance matrix, if and only if it has pdf $$f(x_1,\ldots,x_n)=\frac{\exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))}{\sqrt{(2\pi)^n|\Sigma}}.$$

In the case that $X_1,\ldots,X_n$ are uncorrelated, this is the same as saying $\Sigma$ is a diagonal matrix, in which case the pdf splits, giving independence.

Your second question is a general question about independent random variables. Let $X,Y$ be independent (for convenience, continuous) random variables with pdfs $f_X,f_Y$, respectively. Then the joint pdf is $f_{X,Y}(x,y)=f_X(x)f_Y(y)$. Then the conditional pdf of $x$ given $y$ is $$f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}=f_X(x).$$ Similarly, the marginal pdf for $X$ is just $$\int f_{X,Y}(x,y)dy = f_X(x)\int f_Y(y)dy=f_X(x).$$

The $y$ in the conditional is assumed to come from $\mathcal{Y}:=\{y\in \mathbb{R}:f_Y(y)>0\}$ so that we avoid dividing by zero. This isn't an issue in the normal case.

Your proposed explanation is exactly the correct one: The joint pdf should have level sets that are ellipses. Each ellipse gets narrower as you go along the $y$ axis. But when you divide (normalize) by the $f_Y(y)$, the segments will become the same. That's because the form of the pdf is $f_X(x)f_Y(y)$, we can see that the scaling factor is exactly $f_Y(y)$.