Estimation of variance of population

136 Views Asked by At

I know how to use MLE method and method of moment to estimate the parameter(s) of a given distribution. For example, for the parameter $\theta$ of a population X $\sim$ $Ber(\theta)$, MLE of $\theta$ is $\hat{\theta} = \sum_i \frac{x_i}{n}$

However, it is just the estimation of parameters of the distribution. What if I want to estimate the variance of the distribution. What should I do? I know that it is possible to use the unbiased estimator of variance that we learnt in basic statistics course, but is there any universally accepted procedure for doing it after computing the MLE estimator?

For example, for the population X $\sim$ $Ber(\theta)$, $V(X) = \theta(1 - \theta)$. MLE of $\theta$ is $\hat{\theta} = \sum_i \frac{x_i}{n}$. I can replace $\theta$ by $\hat{\theta}$ to estimate the variance, and the estimator will surely be biased. I want to know if I can do the same thing for other situations just by replacing the parameters with MLE of the parameters?

Thanks a lot!!!!

2

There are 2 best solutions below

0
On

The answer to you question is yes. This is referred to as the invariance property of MLEs. We have the following theorem taken from Casella & Berger's Statistical Inference (2nd edition,Theorem 7.2.10, pg.320).


Theorem 7.2.10. (Invariance property of MLEs)

If $\hat\theta$ is the MLE of $\theta$, then for any function $\tau(\theta)$, the MLE of $\tau(\theta)$ is $\tau(\hat\theta)$.


So in your example, if $\hat\theta$ is the MLE of the parameter $\theta$ from the Bernoulli distribution, then $\tau(\hat\theta)=\hat\theta(1-\hat\theta)$ is MLE of the variance $\tau(\theta)=\theta(1-\theta)$. In general, the MLE of a function like this will be biased.

0
On

First I would like to make some general remarks about maximum likelihood estimators to define the problem correctly.

If $\mathcal{P}$ is a population on $\mathcal{X}$ (family of probability distributions on $\mathcal{X}$) parametrized by a set $\Theta$ and such that each distributions in $\mathcal{P}$ has density $f(x|\theta)$, then the maximum likelihood estimator $\hat{\theta}$ of $\theta$ is define as $$\begin{align}\hat{\theta}(x):=\operatorname{arg.max}_{\theta\in\Theta}f(x|\theta),\qquad x\in\mathcal{X}\tag{0}\label{zero}\end{align}$$

Suppose $\tau:\Theta\rightarrow T$ is a nice function of the parameter $\theta$ that describes the population $\mathcal{X}$. Without loss of generality assume that $T=\tau(\Theta)$. For each $t\in T$ consider the constrained problem $$\begin{align}f^*(x|t):=\operatorname{arg.min}_{\substack{\theta\in\Theta\\\tau(\theta)=t}} f(x|\theta),\qquad x\in\mathcal{X}\tag{1}\label{one}\end{align} $$ The maximum likelihood estimator $\hat{\tau}$ of $\tau(\theta)$ is defined as \begin{align} \hat{\tau}(x):=\operatorname{arg.max}_{t\in T}f^*(x|t),\qquad x\in\mathcal{X}\tag{3}\label{three} \end{align}

  • When problems \eqref{zero}, and \eqref{three} are well defined and have solutions, it can be seen that $$\hat{\tau}(x)=\tau(\hat{\theta}(x))$$ That is the so called invariance property of the maximal likelihood estimator $\hat{\theta}$. This has to do to the fact that the supremum of a constrain supremum is the same as the unconstrained supremum.

That is why in may classic treatises of Statistics, the maximum likelihood estimator any function $\tau=\tau(\theta)$ of the parameter is directly defined as $\tau(\hat{\theta})$, where $\hat{\theta}$ is the maximal likelihood estimator of the parameter $\theta$.


For the particular problem in the OP, we know that the distribution of $n$-i.i.d. samples of the Bernoulli distribution with parameter $p$ is described by the parametric family $$f(x_1,\ldots,x_n|p)=p^{\sum^n_{j=1}x_j}(1-p)^{n-\sum^n_{j=1}x_j}\prod^n_{j=1}\mathbb{1}_{\{0,1\}}(x_j)$$ The maximum likelihood of this family is $$\hat{p}(\mathbf{x})=\frac1n\sum^n_{j=1}x_j$$

The variance of a Bernoulli r.v. with parameter $p$ is given by $\sigma^2(p):=p(1-p)$. Hence, from the discussion above, the maximum likelihood estimator $\hat{\sigma^2}$ is $$\hat{\sigma^2}(\mathbf{X})=\sigma^2(\hat{p}(\mathbf{X}))=\frac{1}{n}\sum^n_{j=1}X_j\,\Big(1-\frac{1}{n}\sum^n_{j=1}X_j\Big)$$