Maximum Likelihood Estimation with Indicator Function

8.3k Views Asked by At

I need to solve this exercise from the book below.

$\textbf{Mathematical Statistics, Knight (2000)}$

$\textbf{Problem 6.17}$

Suppose that $X_1,\ldots,X_n$ are i.i.d. random variables with frequency function

\begin{equation} f(x;\theta)=\begin{cases} \theta & \text{for $x=-1$}.\\ (1- \theta)^2 \theta^x & \text{for $x=0,1,2,\ldots$} \end{cases} \end{equation}

(a) Find the Cramer-Rao lower bound for unbiased estimators based on $X_1,\ldots,X_n$.

(b) Show that the maximum likelihood estimator of $\theta$ based on $X_1,\ldots,X_n$ is

$$\hat{\theta}_n = \frac{2 \sum_{i=1}^{n} I_{(X_{i}=-1)} + \sum_{i=1}^n X_i}{2n + \sum_{i=1}^n X_i}$$

and show that $\{\hat{\theta}_n\}$ is consistent for $\theta$.

(c) Show that $\sqrt{n}(\hat{\theta}_n-\theta)\to_d N(0,\sigma^2(\theta))$ and find the value of $\sigma^2(\theta)$. Compare $\sigma^2(\theta)$ to the Cramer-Rao lower bound in part (a).


No clue on how to solve (a) or (c).

I started to solve (b) but I can't seem to arrive at the desired solution. I'm getting this:

\begin{align} \mathcal{L} &= \prod_{i=1}^n (1-\theta)^2 \theta^{x_i I_{(X_i \geq 0)} + I_{(X_{i}=-1)}} \\ \mathcal{L} &= (1-\theta)^{2 \sum_{i=1}^n I_{(X_i \geq 0)}} \theta^{\sum_{i=1}^n x_i I_{(X_i \geq 0)} + \sum_{i=1}^n I_{(X_{i}=-1)}} \\ \log \mathcal{L} &= 2 \sum_{i=1}^n I_{(X_i \geq 0)} \log(1-\theta) + \sum_{i=1}^n x_i I_{(X_i \geq 0)} \log \theta + \sum_{i=1}^n I_{(X_i=-1)} \log \theta \end{align}

$\textbf{FOC}$

\begin{align} 0 &= - \frac{2 \sum_{i=1}^n I_{(X_i \geq 0)}}{1-\theta} + \frac{\sum_{i=1}^n x_i I_{(X_i \geq 0)}} \theta + \frac{\sum_{i=1}^n I_{(X_i=-1)}} \theta \\ \\ \hat{\theta}_n &= \frac{\sum_{i=1}^n I_{(X_i=-1)} + \sum_{i=1}^n x_i I_{(X_i \geq 0)}}{\sum_{i=1}^n I_{(X_i=-1)} + 2 \sum_{i=1}^n I_{(X_i \geq 0)} + \sum_{i=1}^n x_i I_{(X_i \geq 0)}} \end{align}

which differs from the result I'm given...

Any help would be greatly appreciated.

2

There are 2 best solutions below

2
On BEST ANSWER

The solution that is provided suggests that for a sample $\boldsymbol x = (x_1, \ldots, x_n)$, we should define a random variable $K = \sum_{i=1}^n \mathbb 1_{x_i = -1}$; that is, $K$ counts the number of observations that equal to $-1$. Then $$\begin{align*} \mathcal L(\theta \mid \boldsymbol x) &= \theta^K \prod_{j=1}^{n-K} (1-\theta)^2 \theta^{x_{[j]}} \\ &= \theta^K (1-\theta)^{2(n-K)} \theta^{\sum_{j=1}^{n-K} x_{[j]}} \end{align*}$$ where $x_{[j]}$ represents the $j^{\rm th}$ observation of $\boldsymbol x$ that is nonnegative. But note that $$\sum_{j=1}^{n-K} x_{[j]} = K + \sum_{i=1}^n x_i = K + n \bar x.$$ So we may write the log-likelihood as $$\ell (\theta \mid \boldsymbol x) = 2(n-K) \log (1-\theta) + (2K + n\bar x) \log \theta. $$ Thus the log-likelihood is maximized at a critical point satisfying $$0 = \frac{\partial \ell}{\partial \theta} = -\frac{2(n-K)}{1 - \theta} + \frac{2K +n\bar x}{\theta},$$ or $$\hat \theta = \frac{2K + n \bar x}{n (2+\bar x)}.$$ This is equivalent to the stated solution (just written more compactly).

You may find that working with $K$ and avoiding the unnecessary use of additional indicator functions (you really only need one, namely $\mathbb 1_{x_i = -1}$) will reduce your chances of making errors. Please feel free to attempt the other parts of the question.


If you find it difficult to follow the above solution, it is helpful to consider a numeric example. Suppose you are given the sample $$\boldsymbol x = (-1, 0, 1, 3, -1, 5, -1).$$ Then $n = 7$, $K = 3$, and the sample total is $n \bar x = 6$. We observe that $K + n\bar x = 3 + 6 = 9$, which is equal to the sum of nonnegative observations $\sum_{j=1}^{n-K} x_{[j]} = 0 + 1 + 3 + 5 = 9$.

The resulting likelihood function is $$\mathcal L (\theta \mid \boldsymbol x) = \theta^3 (1-\theta)^{2(7-3)} \theta^{0+1+3+5} = \theta^{12} (1-\theta)^8.$$ This is maximized when $\hat \theta = 12/(8+12) = 3/5$.

3
On

\begin{align} L(\theta) & = \prod_{i=1}^n \theta^{I_{x_i=-1}} (1-\theta)^{2 I_{x_i\ge 0} } \theta^{x_i I_{x_i\ge 0}}. \\[10pt] \ell(\theta) = \log L(\theta) & = (\log\theta)\sum_{i=1}^n (I_{x_i=-1} + x_i I_{x_i \ge 0} ) + 2(\log(1-\theta)) \sum_{i=1}^n I_{x_i\ge0} \\[10pt] \ell\,'(\theta) & = \frac 1 \theta \sum_{i=1}^n (I_{x_i=-1} + x_i I_{x_i \ge 0}) - \frac 2 {1-\theta} \sum_{i=1}^n I_{x_i\ge 0} = \frac A \theta -2\frac B {1-\theta} \\[10pt] & = 0 \text{ if and only if } A(1-\theta) - 2B\theta = 0, \\[10pt] & \qquad \text{and that holds precisely if } A = 2B\theta+A\theta = (A+2B)\theta, \text{ so} \\[10pt] \theta & = \frac A {A+2B} = \frac{\sum_{i=1}^n (I_{x_i=-1} + x_i I_{x_i \ge 0})}{\sum_{i=1}^n (I_{x_i=-1} + x_i I_{x_i \ge 0}) + 2\sum_{i=1}^n I_{x_i\ge 0}}. \end{align}