Expected value and variance of $e^{\frac{2n}{\sum_{i=1}^n X_i^2}}$ (maximum likelihood estimator)

118 Views Asked by At

Let $\theta>1$ be an unknown parameter and let $X_1, X_2, ..., X_n$ be a random sample (which means i.i.d. in this case) from the density $f_\theta$ where $$f_{\theta}(x)=x\theta^{-\frac{x^2}{2}}\log(\theta)\mathbb{1}_{(0, \infty)}(x).$$ We are given that $\mathbb{E}_{\theta}[X_1^2]=\frac{2}{\log(\theta)}$ and $\mathbb{E}_{\theta}[X_1^4]=\frac{8}{(\log \theta)^2}$.

First, I was asked to compute the maximum likelihood estimator of $\theta$, call it $\hat{\theta}_n$. This isn't hard, I got $$\hat{\theta}_n=e^{\frac{2n}{\sum_{i=1}^n X_i^2}}.$$ What I don't know how to do is how to prove if this estimator is efficient, i.e. if the Rao-Cramer bound is attained. To do so I need to find the expected value of my estimator and its variance. But how would I do this? I have no idea what is the distribution of $\sum_{i=1}^n X_i^2$ to even be able to get started on this using the so called law of unconscious statistician. So, how is this supposed to be done?

2

There are 2 best solutions below

0
On BEST ANSWER

Note that this is a regular one-parameter exponential family of distributions. Taking $\ln\theta = a$ (say), it is easy to see that $X^2$ has an exponential distribution when $X$ has the pdf $f_{\theta}$.

The likelihood for $x_1,\ldots,x_n>0$ is

$$L(\theta)=\left(\prod_{i=1}^n x_i\right)\theta^{-\sum_{i=1}^n x_i^2/2}(\ln \theta)^n \quad,\,\theta>1$$

So, log-likelihood is

$$\ell(\theta)=n\ln(\ln\theta)-\left(\frac12\sum_{i=1}^n x_i^2\right)\ln\theta + \sum_{i=1}^n \ln x_i$$

The score function is therefore

$$\ell'(\theta)=\frac{n}{\theta\ln\theta}-\frac1{2\theta}\sum_{i=1}^n x_i^2$$

Or,

$$\ell'(\theta)=-\frac{n}{\theta}\left[\frac1{2n}\sum_{i=1}^n x_i^2 - \frac1{\ln\theta}\right] $$

The last equation is of the form $\ell'(\theta)=k(\theta)(T(\boldsymbol x)-g(\theta))$, which is the equality condition of Cramer-Rao inequality. And this shows that only functions of the form $g(\theta)=\frac1{\ln \theta}$ and constant multiples of it admit estimators whose variance attains the Cramer-Rao bound. So by your definition of efficiency, there does not exist any efficient estimator of $\theta$.

However, in this setup, maximum likelihood estimators are known to be asymptotically efficient. That is to say, the large sample variance of your MLE does attain the Cramer-Rao bound, which is the inverse of Fisher information.

0
On

Preliminaries:

Here I present first a review of results that will be useful:

  • A. (Rao-Crámer bound) Recall that if $\{f_\theta:\theta\in\Theta\subset\mathbb{R}^k\}$ denotes a population dominated by a $\sigma$-finite measure $\nu$ ($f_\theta$ are densities relative to $\nu$) where $\Theta$ is open, $T(X)$ is an $\mathbb{R}^k$-valued statistic, and $f_\theta$ is differentiable in $\theta$, then under the regularity conditions \begin{align} \partial_\theta\mathbb{E}_\theta[h(X)]=\int h(x)\partial_\theta f_\theta(x)\,\nu(dx)=\mathbb{E}_\theta[h(X)\partial_\theta \log(f_\theta(X))]\tag{0}\label{zero} \end{align} for $h(x)=1$ and $h(x)=T(X)$, for $g(\theta):=\mathbb{E}_\theta[T(X)]$ $$\operatorname{Var}_\theta[T(X)]\geq \big[\partial_\theta g(\theta)\big]^\intercal \big[I(f,\theta)\big]^{-1} \big[\partial_\theta g(\theta)\big]$$ where \begin{align}I(f,\theta):=\mathbb{E}_\theta\Big[\big[\partial_\theta \log(f_\theta(X))\big]^\intercal\big[\partial_\theta \log(f_\theta(X))\big]\Big]\end{align} $I$ is a real symmetric $k\times k$-matrix (the inequality is in the sense of positive definite matrices: $A\geq0$ if $x^\intercal Ax\geq0$ for all $x$).

  • B. If $\psi:U\subset\mathbb{R}^k\rightarrow \Theta$ is a $C^1$- diffeomorphism and the population $\tilde{f}_\eta(x):=f_{\psi(\eta)}(x)$ where $\theta=\psi(\eta)$, then by the chain rule it follows that under the parametrization $\{\tilde{f}_\eta:\eta\in U\}$, \begin{align} I(\tilde{f},\eta)=\big[\partial_\eta\psi(\eta)\big]^\intercal I(f,\psi(\eta) )\big[\partial_\eta\psi(\eta)\big]\tag{1}\label{one} \end{align}

  • C. If in addition $f_\theta$ is twice differentiable in $\theta$ and \begin{align} \operatorname{Hess}_{\theta}\mathbb{E}_\theta[h(x)]\big)=\int h(x)\operatorname{Hess}_\theta f_\theta(x)\,\nu(dx) \end{align} for $h(x)=1$, then \begin{align} I(f,\theta)=-\mathbb{E}_\theta\big[\operatorname{Hess}_\theta\log(f_\theta(X))\big]\tag{2}\label{two} \end{align} where $\operatorname{Hess}_\theta$ is the Hessian or total second derivative with respect to $\theta$. This follows from the fact that $$ \partial^2_{\theta^2}\log(f_\theta(x))=\frac{\partial^2_{\theta^2}f_\theta(x)}{f_\theta(x)}-\big[\partial_\theta \log(f_\theta(x))\big]^\intercal\big[\partial_\theta \log(f_\theta(x))\big] $$


The exponential case:

When $f_\theta(x):=\exp\big(p(\theta)T(x)-\xi(\theta)\big)c(x)$, $\theta\in \Theta\subset\mathbb{R}^k$ is an exponential family with natural parameter $\eta=p(\theta)$ (we assume that $p:\Theta\rightarrow U\subset\mathbb{R}^k$ is a $C^1$-injective function so that $\zeta(\eta):=\xi(p^{-1}(\eta))$ is differentiable), then $$\operatorname{Var}_\eta(T(X))=I(\tilde{f},\eta)$$ Indeed, since $\tilde{f}_\eta(x)=\exp\big(\eta^\intercal T(x)-\zeta(\eta)\big)\,c(x)$ and $\zeta$ is differentiable, \begin{align} \partial_\eta \log(\tilde{f}_\eta(x))&=T(x)-\partial_\eta \zeta(\eta) \end{align} Hence, by (A) and (C) \begin{align} \mathbb{E}_\eta[T(X)]=\partial_\eta\zeta(\eta)\qquad I(\tilde{f},\eta)=\operatorname{Var}_\eta[T(X)]=\operatorname{Hess}_\eta\zeta(\eta)\tag{3}\label{three} \end{align}

Suppose further that $\psi(\eta):=\mathbb{E}_\eta[T(X)]=\zeta'(\eta)$ is a $C^1$ - diffeomorphism. Then, by (B) the information function $I$ with respect to the parameter $\vartheta=\psi(\eta)$ satisfies $$ I(\tilde{f},\eta)=\big[\partial_\eta\psi(\eta)\big]^\intercal I(\{\tilde{\tilde{f}},\vartheta)\big[\partial_\eta\psi(\eta)\big] $$ where $\tilde{\tilde{f}}_\vartheta(x)=\tilde{f}_{\psi(\eta)}$. Since $\psi(\eta)=\partial_\eta\zeta(\eta)$, we have that $\partial_\eta\psi(\eta)=\operatorname{Hess}_\eta(\zeta(\eta))=\operatorname{Var}_\eta[T(X)]$. Consequently

\begin{align} I(\tilde{\tilde{f}},\vartheta)=\Big[I(\tilde{f},\eta)\big]^{-1}=\Big[\operatorname{Var}_\eta[T(X)]\Big]^{-1}\tag{4}\label{four} \end{align}


The problem in the OP:

The problem in the OP is of the form describe above: $$f_\theta(\mathbf{x})=e^{-\sum^n_{j=1}x^2_j/2 \log(\theta)+n\log(\log(\theta))}x_1\cdot\ldots\cdot x_n\mathbb{1}_{(0,\infty)^n}(\mathbf{x})$$ In the natural parameter $\eta=-\frac12\log(\theta)$ $$ \tilde{f}_\eta(\mathbf{x})=e^{\eta\sum^n_{j=1}x^2_j +n\log(-2\eta)}x_1\cdot\ldots\cdot x_n\mathbb{1}_{(0,\infty)^n}(\mathbf{x})$$

Let $T(\mathbf{x})=\sum^n_{j=1}x^2_j$ and $\zeta(\eta)=-n\log(-2\eta)$ so that \begin{align} \mathbb{E}_\eta[T(X)] &=\zeta'(\eta)=-\frac{n}{\eta}\\ I(\tilde{f},\eta)&=\operatorname{Var}_\eta[T(X)]=\zeta''(\eta)=\frac{n}{\eta^2} \end{align}

In terms of the parameter $\vartheta=p(\eta)=:\mathbb{E}_\eta[T(X)]=-\tfrac{n}{\eta}$ we have that $$I(\tilde{\tilde{f}},\vartheta)=\frac{1}{I(\eta)}=\frac{\eta^2}{n}=\frac1n\frac{n^2}{\vartheta^2}=\frac{n}{\vartheta^2} $$


In terms of the parameter $\theta=e^{-2\eta}$, we have that $$\frac{n}{\eta^2}=I(\tilde{f},\eta)=\big(D_\eta(e^{-2\eta})\big)^2I(f,\theta)=4e^{-4\eta}I(f,\theta)$$ and so $$I(f,\theta)=\frac{n e^{4\eta}}{4\eta^2}=\frac{n}{\theta^2\log^2\theta} $$ Now $g(\theta)=E_\theta[T(X)]=-\frac{n}{\eta(\theta)}=\frac{2n}{\log\theta}$ and so, $g'(\theta)=-\frac{2n}{\theta\log^2\theta}$. also, $\operatorname{Var}_\theta[T(X)]=\frac{4n}{\log^2\theta}$. Then $$(g'(\theta))^2(I(f,\theta))^{-1}=\operatorname{Var}_\theta[T(X)]$$ that is, the optimal bound in the Rao-Crámer inequality holds for the example in the OP.