This one has been driving me nuts for a while, and it seems like there isn't anywhere that really is dealing with this issue that I have been able to find and understand.
I'm trying to derive the maximum likelihood estimator for the multivariate Gaussian distribution. I know what form $\hat \Sigma$ should have, after taking derivatives and setting to zero--the problem is that I can't seem to show that that this critical point really is a maximum.
What I have so far is this: it's enough to solve
$$ \min_{\Sigma \succeq 0 }f(\Sigma):= \min_{\Sigma \succeq 0 } -\log |\Sigma^{-1}| + (x - \mu)^T \Sigma^{-1} (x - \mu).$$
Further, from the spectal theorem, we find that $\Sigma^{-1}$, being PSD, satisfies
$$ \Sigma^{-1} = PDP^T $$
for an orthogonal matrix $P$ and diagonal matrix $D = Diag(\lambda_1, \ldots, \lambda_n)$ where all the eigenvalues $\lambda_i \geq 0$ and are real.
Writing $(x - \mu)^T \Sigma^{-1} (x - \mu) = (x - \mu)^T PDP^T (x - \mu) = (P^T (x - \mu))^TD(P^T (x - \mu)):= v^TDv = \sum_i v_i^2\lambda_i.$,
The minimization problem then reduces to $$ \min_{\Sigma \succeq 0 } -\log |\Sigma^{-1}| + (x - \mu)^T \Sigma^{-1} (x - \mu) = \min_{\Sigma \succeq 0 } -\log \prod_i \lambda_i + \sum_i v_i^2\lambda_i. $$
From here, I'm not sure where to go. I tried to appeal to level-boundedness of the last expression with respect to $\boldsymbol\lambda = (\lambda_1, \ldots, \lambda_n)$, to show that a minimizer has to exist; proving that, we could argue that the minimizer has to occur at a critical point, and since there's only one critical point for $f(\Sigma)$, namely the one found by setting a derivative with respect to $\Sigma$ equal to zero, it has to be the minimizer.
But I realized we can't parametrize all PSD matrices by only considering eigenvalues, so what I have clearly doesn't work. The $v_i$ can also vary to give different PSD matrices.
Can anyone actually prove that the MLE for $\Sigma$ occurring at the critical point for the likelihood function, is actually a maximizer for the likelihood (in my case, a minimizer for $f$)?
Thanks!
Reparametrizing $H=\Sigma^{-1}$ and setting $v_i=x_i-\mu$, we search for the minimum of $$f(H)=-n\log|H|+\sum_i^nv_i^THv_i$$ in the set of positive definite matrices.
Lets take the first differential over the space of symmetric matrices. $$df(H)=-\operatorname{tr}(nH^{-1}dH)+\operatorname{tr}(\sum_i^nv_iv_i^TdH)$$ Setting the differential to 0 and solving $$\hat H=n(\sum_i^nvv^T)^{-1}\ \text{thus}\ \hat\Sigma=\frac{1}{n}\sum_i^nvv^T$$ We need to have the number of independent vectors equal at least to the size of the matrix $\Sigma$ for this inverse to exist. If it is so, then $H$ is positive definite.
Now we find the second differential over the space of symmetric matrices. $$d^2f(H)=d[-\operatorname{tr}(nH^{-1}dH)]+0=-n\operatorname{tr}(dH^{-1}dH)=n\operatorname{tr}(H^{-1}dHH^{-1}dH)=$$ $$=n\operatorname{tr}((H^{-1}dH)^2)$$ Note that $H^{-1}dH$ is similar to the symmetric matrix because $dH$ is symmetric (we take the differential over symmetric matrices only), $H^{-1}$ is positive definite and $$H^{-1}dH=H^{-\frac{1}{2}}H^{-\frac{1}{2}}dH=H^{-\frac{1}{2}}(H^{-\frac{1}{2}}dHH^{-\frac{1}{2}})H^{\frac{1}{2}}$$ So $H^{-1}dH$ has real eigenvalues as eigenvalues are invariant under similarity and symmetric matrices have real eigenvalues.
Now it can be seen that by the property of trace that it is the sum of eigenvalues, for powers it is powers of eigenvalues $$d^2f(H)=n\operatorname{tr}((H^{-1}dH)^2)=n\sum_j\lambda_j^2(H^{-1}dH)\geq0$$ for any symmetric $dH$. So the second differential is positive semidefinite, meaning that the function is convex. For convex functions we have that the critical point is the global minimum.