For PSD $\Sigma$ ($\Sigma \in \mathbb{R}_+$), I am trying to gain some intuition about the difference between these two expressions.
$z^\top \Sigma^{-1} z$
Where $z \sim N(0, \Sigma)$. I am quite comfortable with the expression $x^\top \Sigma^{-1} x$ as it usually shows up in things like Gaussians. I have the general intuition that this can be interpreted as, $$ z^\top \Sigma^{-1} z = z^\top Q\Lambda^{-\frac{1}{2}}\Lambda^{-\frac{1}{2}}Q z = x^\top x $$
Where $\Sigma = Q\Lambda Q$ is the eigendecomposition of $\Sigma$. If $x$ is correlated according to the covariance $\Sigma$ then $xQ\Lambda^{-\frac{1}{2}}$ can be seen as decorrelating the dimensions of $x$ by rotating and scaling and thus returning the covariance ellipse back to a circle in the Euclidean basis and then taking the Euclidean distance
$x^\top \Sigma x$
Where $x \sim N(0, I)$. I have more trouble understanding what is happening here geometrically because it seems like we are taking some Gaussian noise in the unit circle and then correlating is according to the covariance,
$$ x^\top \Sigma x = x^\top Q\Lambda^{\frac{1}{2}}\Lambda^{\frac{1}{2}}Q x \ = z^\top z $$
Questions
- If what I have derived so far is correct, why would one want to take the dot product of $z^\top z$? Can you describe any practical reason why this would be useful?
- Can you provide some examples of where and why this is used? (I have two which inspired this question Bayesian Linear Regression - With Conjugate Priors Section (Equation 2) and this paper titled Revisiting natural gradient for deep networks - equations 4 and 5)
- Anything else interesting to say about these forms?