Can gradient descent on covariance of Gaussian cause variances to become negative?

210 Views Asked by At

I have a Gaussian distribution with likelihood

$$p(x; \mu, \Sigma) = (|2 \pi |\Sigma|)^{-1/2} \exp(-\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x-mu))$$

and I've computed the gradient of the log likelihood with respect to $\Sigma$ as

$$\nabla_{\Sigma} \log p(x; \mu, \Sigma) = -\frac{1}{2}(\Sigma^{-1} - \Sigma^{-1} (x-\mu)(x-\mu)^T \Sigma^{-1})$$

I'm now trying to optimize $\Sigma$ using gradient descent but my $\Sigma$ is gaining negative diagonal entries, which doesn't make sense as the diagonal entries are variances and thus be non-negative. Which of the following is true?

a) Is this a known possibility when using gradient descent on covariances?

b) Is my derivation incorrect?

c) Is there an error with my gradient descent implementation?

1

There are 1 best solutions below

0
On BEST ANSWER

A friend was able to provide me with an answer. I can't rule out an error in my code, but in general, yes, directly optimizing positive (semi-)definite matrices using gradient descent can result in non-positive (semi-)definite matrices.

This article provides two solutions:

  1. Replace the unconstrained optimization problem with a constrained optimization problem (not recommended)

  2. Reparameterize the covariance to ensure the resulting covariance is P(S)D. For instance, define $\Sigma = A^T A$ and optimize $A$ instead.