Questions Leading From Application of Orthogonal Change of Coordinates to Transform a General Gaussian PDF

65 Views Asked by At

My textbook says the following:

Given a vector $\mathrm{\mathbf{x}}$ of random variables $x_i$ for $i = 1, \dots, N,$ with mean $\bar{\mathrm{\mathbf{x}}} = E[\mathrm{\mathbf{x}}]$, where $E[\cdot]$ represents the expected, and $\Delta \mathrm{\mathbf{x}} = \mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}}$, the covariance matrix $\Sigma$ is an $N \times N$ matrix given by

$$\Sigma = E[\Delta \mathrm{\mathbf{x}} \Delta \mathrm{\mathbf{x}}^T]$$

so that $\Sigma_{i j} = E[ \Delta x_i \Delta x_j]$. The diagonal entries of the matrix $\Sigma$ are the variances of the individual variables $x_i$, whereas the off-diagonal entries are the cross-covariance values.

The variables $x_i$ are said to conform to a joint Gaussian distribution, if the probability distribution of $\mathrm{\mathbf{x}}$ is of the form

$$P(\bar{\mathrm{\mathbf{x}}} + \Delta \mathrm{\mathbf{x}}) = (2 \pi) ^{-N/2} \det(\Sigma^{-1})^{1/2} \exp(-(\Delta \mathrm{\mathbf{x}})^T \Sigma^{-1} (\Delta \mathrm{\mathbf{x}})/2) \tag{A2.1}$$

for some positive-semidefinite matrix $\Sigma^{-1}$.

$\vdots$

Change of coordinates. Since $\Sigma$ is symmetric and positive-definite, it may be written as $\Sigma = U^TDU$, where $U$ is an orthogonal matrix and $D = (\sigma_1^2, \sigma_2^2, \dots, \sigma_N^2)$ is diagonal. Writing $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ and $\bar{\mathrm{\mathbf{x}}}' = U \bar{\mathrm{\mathbf{x}}}$, and substituting in (A2.1), leads to

$$ \begin{align*}\exp(-(\mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}})^T \Sigma^{-1} (\mathrm{\mathbf{x}} - \bar{\mathrm{\mathbf{x}}})/2) &= \exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T U \Sigma^{-1} U^T (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2) \\ &= \exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2) \end{align*}$$

Thus, the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. A further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. Equivalently stated, a change of coordinates may be applied to transform Mahalanobis distance to ordinary Euclidean distance.

Appendix 2, Multiple View Geometry in Computer Vision by Hartley and Zisserman.

I'm having trouble understanding the following section:

Thus, the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. A further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. Equivalently stated, a change of coordinates may be applied to transform Mahalanobis distance to ordinary Euclidean distance.

  1. It says that the orthogonal change of coordinates from $\mathrm{\mathbf{x}}$ to $\mathrm{\mathbf{x}}' = U \mathrm{\mathbf{x}}$ transforms a general Gaussian PDF into one with diagonal covariance matrix. But in the final expression, we have $D^{-1}$, whereas, if I'm not mistaken, the diagonal covariance matrix is $D = (\sigma_1^2, \sigma_2^2, \dots, \sigma_N^2)$; so $D^{-1}$ is not the diagonal covariance matrix, but the inverse of it. So how is it that the orthogonal change of coordinates transforms a general Gaussian PDF into one with diagonal covariance matrix? Isn't it the case that the orthogonal change of coordinates transforms a general Gaussian PDF into one with the inverse of the diagonal covariance matrix?

  2. It says that further scaling by $\sigma_i$ in each coordinate direction may be applied to transform it to an isotropic Gaussian distribution. My search for information regarding what an isotropic Gaussian distribution is led me to this question, where it is stated that an isotropic Gaussian distribution is one where the covariance matrix is represented by the simplified matrix $\Sigma = \sigma^2 I$. Again, how does scaling $\exp(-(\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')^T D^{-1} (\mathrm{\mathbf{x}}' - \bar{\mathrm{\mathbf{x}}}')/2)$ by $\sigma_i$ transform the general Gaussian PDF into an isotropic Gaussian distribution? I don't see where the $\Sigma = \sigma^2 I$ would come from?

  3. I know that the Mahalanobis distance is $|| \mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}}||_{\Sigma} = ((\mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}})^T \Sigma^{-1}(\mathrm{\mathbf{X}} - \mathrm{\mathbf{Y}}))^{1/2}$, but it doesn't seem like this is the same as any of the expressions above (although, it is obviously similar)? And where is the Euclidean distance that is mentioned? My research came across the Euclidean distance matrix, but I also do not see how this is a part of any of the above expressions?

I would greatly appreciate it if people could please take the time to clarify these points.

1

There are 1 best solutions below

7
On BEST ANSWER
  1. You're misinterpreting what they're saying about the covariance matrix. They're not saying whether it's $D$ or $D^{-1}$, nor which shows up in the middle of the exponent in the pdf. The statement is simply that the covariance matrix is now diagonal.
  2. The point is that suitable choices of scaling ensure the quadratic function reduces to a sum of squares, all with the same coefficient (which is easiest to consider when it gives $\Sigma=I$). Explicitly if $y_i:=\sqrt{(D^{-1})_{ii}}(x_i'-\bar{x}_i')$ the $y$-space pdf is proportional to $\exp -y^Ty/2$.
  3. The pdf is proportional to $\exp -\frac{1}{2}\Vert X-Y\Vert_\Sigma^2$. The Euclidean distance is the $\Sigma=I$ special case of the Mahalanobis distance.