Confusion about the expected distance between two gaussian random variables

195 Views Asked by At

This math stack exchange answer gives a very nice solution for the expected squared L2 norm of any gaussian random variable:

$$ \mathbb{E}\{\|Z\|^2\} = \| \mu \|^2 + Tr(\Sigma)$$

I walked through the derivation and it all makes sense to me, but I'm having trouble lining this up with my intuition about the gaussian distribution. As I understand, we can visualize the covariance $\Sigma$ as an ellipsoid, where the major/minor axes are given by the eigenvectors of $\Sigma$, and the eigenvalues tell us the scale of the ellipsoid in that direction. Also, the i'th diagonal element of $\Sigma$ is the variance of $Z_i$.

So $Tr(\Sigma)$ is the sum of the variances of $Z_i$. It makes sense that this should be the case. However, we also should be able to rotate the gaussian distribution about the mean and still have the same value for $\mathbb{E}\{\|Z\|^2\}$. But doing so would involve changing $Tr(\Sigma)$, right?

To concisely summarize: Why is $Tr(\Sigma)$ invariant to rotations about the mean?

2

There are 2 best solutions below

0
On BEST ANSWER

The question has been answered, but I would like to point out that your identity has nothing to do with the Gaussian distribution and it would hold for any random vector $Z$; rather it is a simple consequence of the bias-variance decomposition of the mean squared error:

$$E[\|\hat\theta-\theta\|^2]=\|E[\hat\theta]-\theta\|^2+ tr(V(\hat\theta)),$$

where $\hat \theta$ is some random vector thought of as an estimator for parameter $\theta$. Your identity is the case where $\hat\theta=Z\sim N(\mu,\Sigma),\theta=0.$


Here is a simple proof to convince you. First, recall some properties:

  1. $V(X)=E[XX']-E[X]E[X']$
  2. $tr(A+B)=tr(A)+tr(B)$
  3. $tr(AB)=tr(BA)$.

Then we have

$$\begin{align}tr(V(\hat\theta))&=tr(V(\hat\theta-\theta))\\ &=tr(E[(\hat\theta-\theta)(\hat\theta-\theta)']-E[\hat\theta-\theta]E[(\hat\theta-\theta)'])\quad (1)\\ &=tr(E[(\hat\theta-\theta)(\hat\theta-\theta)'])-tr(E[\hat\theta-\theta]E[(\hat\theta-\theta)'])\quad (2)\\ &=tr(E[(\hat\theta-\theta)'(\hat\theta-\theta)])-tr(E[\hat\theta-\theta]'E[\hat\theta-\theta])\quad (3)\\ &=E[\|\hat\theta-\theta\|^2]-\|E[\hat\theta]-\theta\|^2\\ \implies E[\|\hat\theta-\theta\|^2]&=\|E[\hat\theta]-\theta\|^2+ tr(V(\hat\theta))& \end{align}$$

0
On

Your rotation would entail replacing $\Sigma$ with $U \Sigma U^\top$ for some rotation $U$. Then $\operatorname{Tr}(U \Sigma U^\top) = \operatorname{Tr}(\Sigma U^\top U) = \operatorname{Tr}(\Sigma)$.