Maximum Likelihood Estimation of Multivariate Gaussian Density, where the number of samples is smaller than the unknown parameters

157 Views Asked by At

If we want to estimate the $p\times p$ (full rank) covariance matrix $\Sigma$ of multivariate normal density, using $n$ sample vectors $\mathbf{x}_1, \ldots, \mathbf{x}_n$, then the empirical covariance matrix is known to be the maximum likelihood estimation $\Sigma_\text{ML}$ (at least when $n\gg p$) of $\Sigma$. But when $n\propto p$, or in particular when $n<p$, then ML estimation is not reliable, and becomes singular, since $\Sigma_\text{ML} = \dfrac{1}{n} XX'$, where $X=[\mathbf{x}_1,\ldots,\mathbf{x}_n]$ is a $p\times n$ matrix. Now, since $n<p$ the rank of this $p\times p$ matrix $\Sigma_\text{ML}$ is at most $n$, making it singular. Which obviously is not a good estimation, since we have samples of a full rank matrix $\Sigma$.

But the covariance matrix simply consists of pairwise covariance values $\Sigma=\sigma_{ij}$, and to compute each, we can use the sample vectors, (using components $i$ of each sample vector and the whole $\mathbf{x}_n$), which are apparently enough to estimate $\sigma_{ij}$. We can do this for all the covariance elements. But what is the reason that we still cannot rely on ML estimation? Is it because, we use those $n$ samples $p\times (p+1)/2$ times (to compute each $\sigma_{ij}$), and this results in having somehow dependent rows (columns), and hence a singular $\Sigma_\text{ML}$? But how we can rigorously show this?