I have a Gaussian distribution with likelihood
$$p(x; \mu, \Sigma) = (|2 \pi |\Sigma|)^{-1/2} \exp(-\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x-mu))$$
and I've computed the gradient of the log likelihood with respect to $\Sigma$ as
$$\nabla_{\Sigma} \log p(x; \mu, \Sigma) = -\frac{1}{2}(\Sigma^{-1} - \Sigma^{-1} (x-\mu)(x-\mu)^T \Sigma^{-1})$$
I'm now trying to optimize $\Sigma$ using gradient descent but my $\Sigma$ is gaining negative diagonal entries, which doesn't make sense as the diagonal entries are variances and thus be non-negative. Which of the following is true?
a) Is this a known possibility when using gradient descent on covariances?
b) Is my derivation incorrect?
c) Is there an error with my gradient descent implementation?
A friend was able to provide me with an answer. I can't rule out an error in my code, but in general, yes, directly optimizing positive (semi-)definite matrices using gradient descent can result in non-positive (semi-)definite matrices.
This article provides two solutions:
Replace the unconstrained optimization problem with a constrained optimization problem (not recommended)
Reparameterize the covariance to ensure the resulting covariance is P(S)D. For instance, define $\Sigma = A^T A$ and optimize $A$ instead.