How to vectorize/matricize multivariate Gaussian PDF for more efficient computation?

1k Views Asked by At

Context: I was recently implementing (in Python) the Expectation-Maximization (EM) algorithm for Gaussian mixture models, and part of that process involves computing the Gaussian PDF for various points $x_i$ with various parameters. I implemented it somewhat naively, which was fast enough for what I needed, but I feel like it could be computed faster if it was written in a nice, clean matrix form. However, I don't know how you'd approach getting it in the form you'd need.

The multivariate Gaussian PDF I need is in the form:

$$ p_k(x_i \mid \mu_k, \Sigma_k) = \frac{1}{(2\pi)^\frac{d}{2} \vert \Sigma_k \vert^{\frac{1}{2}}} e^{-\frac{1}{2} (x_i - \mu_i)^T \Sigma_k^{-1} (x_i - \mu_k)} $$

Notation:

  • $x_i$ is a $d$-dimensional vector of input data, where $1 \le i \le N$ ($N$ is total number of data points)
  • $k \in {1, 2, ..., K}$ is another parameter over which I have to calculate $p(x_i)$ for each value of $k$
  • $\mu_k$ is $d$-dimensional set of means for the data
  • $\Sigma_k$ is a $d \times d$ covariance matrix for the data

My main points of confusion:

The equation as I typed above gives a scalar probability for a single $i$ and a single $k$. I need to calculate it for all $i$, $k$, so in the end I'd want an $N \times K$ matrix of values.

The main portion of the exponential part, $(x_i - \mu_i)^T \Sigma_k^{-1} (x_i - \mu_k)$, effectively takes a $1 \times d$ vector, multiplies by a $d \times d$ matrix, and then multiplies again by a $d \times 1$ vector. With a single data point we get a $1 \times 1$ scalar as a result, but with more than one data point wouldn't we get an $N \times N$ matrix?

Can anyone help me figure out a more elegant way to write this? Again, what I'd ultimately like to end up with is some one-liner to get an $N \times K$ matrix that I supposed you'd write as: $p(X \mid \mu, \Sigma)$ where $X$ is an $N \times d$ matrix, $\mu$ is a $k \times d$ matrix, and $\Sigma$ is a $k \times d \times d$ matrix.

1

There are 1 best solutions below

0
On

The result you need is actually the diagonal of the $n\times n$ matrix you get when computing $(X - \mu)\Sigma^{-1} (X - \mu)^T$, where $x$ is $N\times d$, $\mu$ is $1\times d$ and $\Sigma$ is $d\times d$.

One way to get that without computing the entire the matrix (just the diagonal) is to compute the individual sums of the scalar product of rows of $(X - \mu)$ and columns of $\Sigma^{-1} (X - \mu)^T$. See this answer.

In Python, this gives:

((x-mu) * (np.linalg.inv(cov_mat)@(x-mu).T).T).sum(-1))

where '*' is the term-wise product, and '@' is the matrix product.

Now, you seem to have a $\mu$ of size $k\times d$ and $\Sigma$ of size $k\times d \times d$, which is an additional level of vectorization, and I'm not sure how to deal with that. What shape would $(X - \mu)$ be then? $k\times N \times d$?