I'm looking at Scipy's way of calculating the score of a sample in a Gaussain mixture model, and there is something I don't get.
I'm taking as reference this definition: $$ \begin{align} p(\vec{x}) &= \sum_{i=1}^K\phi_i \mathcal{N}(\vec{x} \;|\; \vec{\mu}_i, \Sigma_i)\\ \mathcal{N}(\vec{x} \;|\; \vec{\mu}_i, \Sigma_i) &= \frac{1}{\sqrt{(2\pi)^K|\Sigma_i|}} \exp\left(-\frac{1}{2}(\vec{x}-\vec{\mu}_i)^\mathrm{T}{\Sigma_i}^{-1}(\vec{x}-\vec{\mu}_i)\right)\\ \sum_{i=1}^K\phi_i &= 1 \end{align} $$ where $K$ is the number of components, $\mu_k$ and $\Sigma_k$ are the mean and the covariance matrix of the $k^{th}$ component, and $\phi_k$ is the weight of the $k^{th}$ component in the mixture.
Scipy computes $\log(p(\vec{x}))$ in the function score_samples, in files base.py and gaussian_mixture.py:
class BaseMixture[...]:
[...]
def score_samples(self, X):
"""Compute the weighted log probabilities for each sample."""
[...]
return logsumexp(self._estimate_weighted_log_prob(X), axis=1)
def _estimate_weighted_log_prob(self, X):
"""Estimate the weighted log-probabilities, log P(X | Z) + log weights."""
[...]
return self._estimate_log_prob(X) + self._estimate_log_weights()
Ok, so here they just simplify future calculations by:
$$
\log(p(\vec{x})) = \log\left(\sum_{i=1}^K\exp(\log(\phi_i) + log(\mathcal{N_i}))\right)
$$
_estimate_log_weights just returns what its name says:
class GaussianMixture(BaseMixture):
def _estimate_log_weights(self):
return np.log(self.weights_)
My issue is with _estimate_log_prob:
def _estimate_log_prob(self, X):
return _estimate_log_gaussian_prob(X, self.means_,
self.precisions_cholesky_, self.covariance_type)
where self.means_ is $\mu$ et self.precisions_cholesky_ is "the Cholesky decomposition of the precision matrices of each mixture component. The precision matrix is the inverse of the covariance matrix", $\Sigma$ in the original definition.
If we look at the definition of _estimate_log_gaussian_prob, we have:
def _estimate_log_gaussian_prob(X, means, precisions_chol, covariance_type):
"""Estimate the log Gaussian probability."""
n_samples, n_features = X.shape
n_components, _ = means.shape
# det(precision_chol) is half of det(precision)
log_det = _compute_log_det_cholesky(
precisions_chol, covariance_type, n_features)
[...]
log_prob = np.empty((n_samples, n_components))
for k, (mu, prec_chol) in enumerate(zip(means, precisions_chol)):
y = np.dot(X, prec_chol) - np.dot(mu, prec_chol)
log_prob[:, k] = np.sum(np.square(y), axis=1)
[...]
return -.5 * (n_features * np.log(2 * np.pi) + log_prob) + log_det
This is supposed to compute: $$ \log(\mathcal{N}(\vec{x} \;|\; \vec{\mu}_i, \Sigma_i)) = \log(\frac{1}{\sqrt{(2\pi)^K|\Sigma_i|}}) + \left(-\frac{1}{2}(\vec{x}-\vec{\mu}_i)^\mathrm{T}{\Sigma_i}^{-1}(\vec{x}-\vec{\mu}_i)\right)\\ = - 0.5 K \log(2\pi)-0.5 \log(|\Sigma_i|) + \left(-\frac{1}{2}(\vec{x}-\vec{\mu}_i)^\mathrm{T}{\Sigma_i}^{-1}(\vec{x}-\vec{\mu}_i)\right) $$
At this point, I have two questions:
- Aren't
log_probandlog_detinverted? (Why isn't itreturn -.5 * (n_features * np.log(2 * np.pi) + log_det) + log_prob?) - Is
log_probreally calculating $-\frac{1}{2}(\vec{x}-\vec{\mu}_i)^\mathrm{T}{\Sigma_i}^{-1}(\vec{x}-\vec{\mu}_i)$ ? How?
As you have stated in the question the log probability of the multivariate gaussian is as follows. I'll take a single component to simplify notation. $$ \log{\mathcal{N}(x | \mu, \Sigma)} = -\frac{1}{2}K\log{2\pi} - \frac{1}{2}\log{|\Sigma|} - \frac{1}{2} (x - \mu)^T\Sigma^{-1}(x-\mu) \tag{1} $$
The Scipy algorithm is using the Cholesky decomposition of the precision matrices. The precision is the inverse of the covariance, i.e., $$ \Lambda = \Sigma^{-1} $$ Let $L$ denote the Cholesky decomposition of a particular precision matrix, so that, $$ \Lambda= LL^T $$
Computing the log_prob
Starting from the Mahalanobis term in Equation (1), $$ \begin{align} ( x - \mu )^T \Sigma^{-1} (x - \mu ) =& ( x - \mu )^T \Lambda (x - \mu ) \\ =& ( x - \mu )^T LL^T (x - \mu ) \\ =& \|L^T (x - \mu )\|^2 \end{align} $$
The algorithm first computes $ y = L^T (x - \mu_i )$ using the dot function, and then takes the norm squared, i.e. $log\_prob = \|y\|^2$. It then needs to multiply by a factor of $-\frac{1}{2}$, which happens in the return statement.
Computing the log_det
Starting from the log determinant term in equation (1), $$ \begin{align} -\frac{1}{2}\log{|\Sigma|} =& -\frac{1}{2} \log{|\Lambda^{-1}|} \\ =& \frac{1}{2}\log{|\Lambda|} \\ =& \frac{1}{2}\log{|LL^T|} \\ =& \log{|L|} \end{align} $$
In the return statement, $log\_det = \det{L}$, and so there is no need to multiply by $-\frac{1}{2}$.
The whole computation
In summary, the final computation in Scipy is, $$ \log{\mathcal{N}(x | \mu, \Sigma)} = -\frac{1}{2} \left( K\log{2\pi} + \|L^T(x - \mu)\|^2 \right) +\log{|L|} $$