I am having trouble doing a derivation. I want to find the MLE estimate of $\sigma^2$ in a spherical gaussian, i.e when we have set $\Sigma = \sigma^2I$.
I have already seen https://stats.stackexchange.com/questions/238199/mle-of-multivariate-normal-distribution-with-diagonal-covariance-matrix but wanted to do the derivation more thoroughly.
Say that we have $m$ points, and that our data consists of $p$-features. We then know that the log likelihood of a multivariate gaussian is (from https://stats.stackexchange.com/questions/351549/maximum-likelihood-estimators-multivariate-gaussian/351550)
$$l(\mu, \Sigma ) = - \frac{mp}{2} \log (2 \pi) - \frac{m}{2} \log |\Sigma| - \frac{1}{2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T \Sigma^{-1} (x^{(i)} - \mu) } $$
Now, inserting the fact that $\Sigma = \sigma^2I$ we have that $|\Sigma| = |\sigma^2 I| = (\sigma^2)^p$, we get that
$$l(\mu, \sigma^2 ) = - \frac{mp}{2} \log (2 \pi) - \frac{mp}{2} \log \sigma^2 - \frac{1}{2 \sigma^2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$
Deriving based on $\sigma^2$ we get that
$$\frac{\partial l}{\partial \sigma^2} = -\frac{mp}{2}\frac{1}{\sigma^2} + \frac{1}{2(\sigma^2)^2} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$
Setting the derivative to 0 we get that
$$\hat \sigma^2 = \frac{1}{mp} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) } $$
But this is not the same as other results I have found online (for example https://stats.stackexchange.com/questions/238199/mle-of-multivariate-normal-distribution-with-diagonal-covariance-matrix which I think argue that $\hat \sigma^2 = \frac{1}{m} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$ and http://cs229.stanford.edu/section/gaussians.pdf which states that it should equal the MLE from the univariate case).
What is going on here? I have for a while suspected that I calculate the determinant incorrectly, thus giving extra $p$ in the denominator, but I think that is not the case. Can you help me find out what is wrong in my reasoning?
Interestingly, I seem to have misunderstood the links I was referring to.
I did a quick simulation in python to see if I can estimate $\sigma^2$ with my formula.
with a "key" $\sigma^2 = 2$, this code outputs $\hat \sigma^2 \approx 2$. Seems like it was all a misunderstanding from my side.
To be clear, the correct MLE for spherical Gaussian distribution with $\Sigma = \sigma^2 I$ seems thus to be $$ \hat \sigma^2 = \frac{1}{mp} \sum_{i=1}^m \mathbf{(x^{(i)} - \mu)^T (x^{(i)} - \mu) }$$ at least according to this quick simulation. Hope this helps anyone that confuses themselves as badly as myself in the future.