Spherical Gaussian MLE

1.3k Views Asked by At

I am having trouble doing a derivation. I want to find the MLE estimate of $\sigma^2$ in a spherical gaussian, i.e when we have set $\Sigma = \sigma^2I$.

I have already seen https://stats.stackexchange.com/questions/238199/mle-of-multivariate-normal-distribution-with-diagonal-covariance-matrix but wanted to do the derivation more thoroughly.

Say that we have $m$ points, and that our data consists of $p$-features. We then know that the log likelihood of a multivariate gaussian is (from https://stats.stackexchange.com/questions/351549/maximum-likelihood-estimators-multivariate-gaussian/351550)

$$l(\mu, \Sigma ) = - \frac{mp}{2} \log (2 \pi) - \frac{m}{2} \log |\Sigma| - \frac{1}{2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T \Sigma^{-1} (x^{(i)} - \mu) } $$

Now, inserting the fact that $\Sigma = \sigma^2I$ we have that $|\Sigma| = |\sigma^2 I| = (\sigma^2)^p$, we get that

$$l(\mu, \sigma^2 ) = - \frac{mp}{2} \log (2 \pi) - \frac{mp}{2} \log \sigma^2 - \frac{1}{2 \sigma^2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$

Deriving based on $\sigma^2$ we get that

$$\frac{\partial l}{\partial \sigma^2} = -\frac{mp}{2}\frac{1}{\sigma^2} + \frac{1}{2(\sigma^2)^2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$

Setting the derivative to 0 we get that

$$\hat \sigma^2 = \frac{1}{mp} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) } $$

But this is not the same as other results I have found online (for example https://stats.stackexchange.com/questions/238199/mle-of-multivariate-normal-distribution-with-diagonal-covariance-matrix which I think argue that $\hat \sigma^2 = \frac{1}{m} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$ and http://cs229.stanford.edu/section/gaussians.pdf which states that it should equal the MLE from the univariate case).

What is going on here? I have for a while suspected that I calculate the determinant incorrectly, thus giving extra $p$ in the denominator, but I think that is not the case. Can you help me find out what is wrong in my reasoning?

1

There are 1 best solutions below

0
On BEST ANSWER

Interestingly, I seem to have misunderstood the links I was referring to.

I did a quick simulation in python to see if I can estimate $\sigma^2$ with my formula.

import numpy as np
import matplotlib.pyplot as plt
# Generate random variables

correct_sigma2 = 2
n = 5000
cov = correct_sigma2*np.eye(2)
data = np.random.multivariate_normal([0,0], cov, 5000)

sigma_hat2 = 0
for x in data:
    sigma_hat2 += np.dot(x,x)
sigma_hat2 *= 1/(n*2)

print(sigma_hat2)

with a "key" $\sigma^2 = 2$, this code outputs $\hat \sigma^2 \approx 2$. Seems like it was all a misunderstanding from my side.

To be clear, the correct MLE for spherical Gaussian distribution with $\Sigma = \sigma^2 I$ seems thus to be $$ \hat \sigma^2 = \frac{1}{mp} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$ at least according to this quick simulation. Hope this helps anyone that confuses themselves as badly as myself in the future.