This is likely a very vague and "wish-washy" question but I wanted to ask it on here. I have recently began studying mathematical statistics and have been working with the MLE estimators for a while now. There are of course a few exceptions of course but the MLE always seem to be the empirical average, empirical variance or another obvious and intuitive sum that one would guess for the parameter in question. For example when considering a sequence of IID normal random variables we have the MLE for the mean = $ \hat{\mu} =\frac{1}{n}\sum\limits_{i=0}^nX_i$ and $\hat{\sigma} = \frac{1}{2}\sum\limits_{i=0}^n(X_i - \hat{\mu})^2$
Would someone be able to provide some intuition as to this "rule of thumb". Maybe I have just been looking too closely into this and its actually very obvious. It does feel natural I just cant seem to explain it at least somewhat rigorously.
Thanks :)
The fact that MLE-variance is equal to sample-variance critically hinges on the data being normal. For example let the data be poisson-distributed ($P(x)=\frac{\lambda^x e^{-\lambda}}{x!}$ with $x\in\mathbb{N}$). Then the (maximum-likelihood) estimator of the variance is equal to the estimator for the mean is equal to the sample mean, and thus not equal to the sample-variance. This is due to a very nice property of MLE called "functional invariance".
I think the conclusion is this: MLE gives a very natural estimator for the parameters of a distribution. And one of the parameters of the normal distribution happens to be the variance. Thats where the coincidence of empirical and ML-estimator comes from.