The multivariate gaussian distribution has probability density function:
$$ \frac{1}{\sqrt{(2\pi)^k|\mathbf{\Sigma}|}} \exp\left(-\frac12(\mathbf{x}-\boldsymbol{\mu})^\text{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right) $$
(Original at https://i.stack.imgur.com/pVm0K.png)
When comparing the univariate and the multivariate p.d.f, it is interesting to notice the following two facts:
- Dividing by sigma square in the exponent is replaced by multiplying for the inverse of the covariance matrix
- The normalization factor in front of the exponent now contains the determinant of the covariance matrix
The second fact is somehow easier to understand: normalizing the area of the p.d.f now requires to use the determinant of the covariance matrix, since it is the "magnitude" of the space transformation described by the covariance matrix.
The first one is instead more complex to grasp. When we perform this operation
$$ \mathbf{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu}) $$
represents a transformation of the vector $$(\mathbf{x}-\boldsymbol{\mu})$$
which itself produces a new transformation, which we will then apply to
$$ (\mathbf{x}-\boldsymbol{\mu})^\text{T} $$
Why is this double linear transformation equivalent to dividing by sigma in a single-dimension case?
It must be because it accounts for the dispersion in the exponent. We can use the trace rule to rewrite the exponent:
$$\begin{split}f(\textbf x)&\propto e^{-\frac 12 \text{tr}((x-\mu)^T\Sigma^{-1}(x-\mu))}\\ &=e^{-\frac 12\text{tr}((x-\mu)(x-\mu)^T\Sigma^{-1})}\end{split}$$
Since $(x-\mu)(x-\mu)^T$ is a measure of dispersion, we can't multiply it by the dispersion again. Therefore, we need to use the inverse of the covariance to make the pdf make sense.
Alternatively, you can think of it in terms of quadratic forms. $x^TAx$ is the matrix equivalent of $ax^2$. So there you have it.