I've started reading on Convariance Matrix estimation through Graphical model in high-dimensional situation. But I have several questions.
Suppose, $X_i \overset{iid}{\sim} N_p(\mu,\Sigma)$, $I=1, \dots, n$. Define $S=\frac{1}{n}\sum (X_i-\bar{X})(X_i-\bar{X})'$. Then almost in every literature, they claim $S$ is really a 'bad estimator' (or performs poorly) in case of large $p$ small $n$. However I understand $S$ is always non-negative definite and symmetric and will be positive definite (w.p. 1) if $n\geq p+1$. Also $\hat{\Sigma}_{mle}=S$. My question is
1) Why S performs poorly when $p>n$?
2) If $\Sigma$ is sparse why do we use different approach to estimate it (methodology involving Graphical representation of $Q=\Sigma^{-1}$) ?
BTW I am reading Graphical Models by S. Lauritzen, Gaussian Markov Random Fields by Rue and Held. Please fell free to suggest any other books, materials etc. That you think will help me understand.