how do you prove that the MLE is sufficient for an exponential family?

810 Views Asked by At

I read (sorry only found such theorem in french https://perso.math.univ-toulouse.fr/jydauxoi/files/2013/11/cours_stat_inf_Master.pdf ) that whenever there exists an sufficient statistics for an exponential familly, then the maximum likelihood estimator is a function of it.

I'd like to prove something lighter, how do you prove that the MLE of an exponential family is sufficient ?

thank you.

1

There are 1 best solutions below

4
On BEST ANSWER

I think the logic is more or less along these lines, following convention of wikipedia:

For a distribution belonging to the natural exponential family, the likelihood function given the data $x$ is of the form

$$\ell(\eta)=\exp\left\{\eta\,T(x)-A(\eta)\right\}h(x)$$

Here note that $T$ is a sufficient statistic for the parameter $\eta$.

Then the equation $\frac{\partial}{\partial\eta}\ln \ell(\eta)=0$ has the unique solution $A'(\eta)=T(x)$, assuming $T(x)$ is in the range of $A'(\eta)$. Moreover, $\frac{\partial^2}{\partial\eta^2}\ln \ell(\eta)=-A''(\eta)=-\operatorname{Var}(T)$ (see wiki).

Since variance is non-negative, $-\ln \ell(\eta)$ is convex in $\eta$ and $T(x)$ is the unique MLE of the parameter $\mu(\eta)=A'(\eta)$. The function $\mu$ is one-to-one so that $\mu^{-1}$ exists. Therefore by invariance, MLE of $\eta$ is $\hat\eta=\mu^{-1}(T(x))$.

In case you have a general exponential family, the likelihood function can be written as $$\ell(\theta)=\exp\left\{\eta(\theta)T(x)-B(\theta)\right\}h(x)$$

Now the MLE of $\theta$ is $\hat\theta=\eta^{-1}(\hat\eta)$, provided $\eta^{-1}$ exists and $\eta$ is in the range of $\eta(\theta)$.

Here also $\hat\theta$ satisfies the likelihood equation $\frac{\partial}{\partial\theta}\ln\ell(\theta)=\eta'(\theta)T(x)-B'(\theta)=0$.

So the MLE in an exponential family always turns out to be a function of a sufficient statistic. If this function is one-to-one, then the MLE is itself sufficient. These results can be generalised in similar manner for the case when the parameter is vector-valued.