Estimate unknown parameter by maximum likelihood method and moment's method

123 Views Asked by At

There is a random sample $X_1, X_2, ..., X_n$ distributed with:

$ x $ -3 0 3
$ P(X = x) $ $ \frac{1}{3} - \theta $ $ \frac{1}{3} + 2 \theta $ $ \frac{1}{3} - \theta $

Estimate unknown parameter by maximum likelihood method and moment's method. Are these estimates unbiased and consistent?

Moments:

$ L(\theta) = (\frac{1}{3} - \theta)(\frac{1}{3} + 2\theta)(\frac{1}{3} - \theta) = \frac{1}{27} + 2\theta^3-\theta^2 $

$ L'(\theta)=6\theta^2-2\theta=0$

$ \theta = 0, \theta = \frac{1}{3}$

Maximum likelihood:

$ EX = -3(\frac{1}{3} - \theta)+0(\frac{1}{3}+2\theta)+3(\frac{1}{3}-\theta)=0 $

$ EX^2 = (-3)^2(\frac{1}{3} - \theta)+0^2(\frac{1}{3}+2\theta)+3^2(\frac{1}{3}-\theta) = 6 - 18\theta $

$ DX = (EX)^2 - EX^2 = 6-18\theta $

$ \sum_{i=1}^{3} (x_{avg}-x_{i})^2 = (\frac{1}{3}-\frac{1}{3}+\theta)^2 + (\frac{1}{3}-\frac{1}{3}+2\theta)^2+(\frac{1}{3}-\frac{1}{3}+\theta)^2=6\theta^2 $

$ 6\theta^2 = 6 - 18\theta$

$\theta = -3.3$ or $ \theta = 0.3 $

If I'm correct, what next should be in testing unbiasedness and consistency?

2

There are 2 best solutions below

1
On BEST ANSWER

I think you have confused the specification of the probability mass function for the sample. Specifically, the table gives $$\begin{align} \Pr[X = -3] &= \frac{1}{3} - \theta, \\ \Pr[X = 0] &= \frac{1}{3} + 2\theta, \\ \Pr[X = 3] &= \frac{1}{3} - \theta, \end{align} \tag{1}$$ and you can check that the sum of these probabilities always equals $1$ for any choice of the parameter $\theta$. This is the theoretical probability distribution from which observations in the sample are drawn.

So for instance, if $\theta = 0$, then the observations $X_i$ are drawn from a discrete uniform random variable on the set $\{-3, 0, 3\}$, and you would expect that each of these outcomes would be appear with roughly equal frequency in such a sample.

However, if it were the case that $\theta = \frac{3}{10}$, then $\Pr[X = -3] = \Pr[X = 3] = \frac{1}{30}$, and a sample drawn from such a distribution would have relatively few of these outcomes.

So this is the idea behind parameter estimation: we are using the sample to make some kind of inference about the value of the parameter that was used to generate the sample. An estimator is a function of the sample that provides this estimate, and as such, can only depend on known quantities.

Think of it like a game: suppose I gave you a sample of size $n = 10$:

$$(x_1, \ldots, x_{10}) = (0, 3, -3, -3, -3, -3, 0, 3, -3, -3). \tag{2}$$

How would you use this to estimate $\theta$? Does the order of the outcomes matter? Your intuition would suggest not; moreover, you would probably start by counting the frequencies of each outcome: there were six $-3$s, two $0$s, and two $3$s. In a sense, the statistic $(6,2,2)$ captures all of the information about $\theta$ that was present in the original sample, because information about which outcomes had which value is not relevant to $\theta$, and the reason for this is because the individual observations are IID; hence, any permutation of the outcomes in $(2)$ has the same joint probability of occurring.

But this statistic $(6,2,2)$ doesn't actually directly provide an estimate for $\theta$. We have to do something more to it. One simple approach is to assume that the mean of the sample is representative of the theoretical mean (expected value) of the underlying distribution from which it drawn; i.e., $$\frac{1}{n} \sum_{i=1}^n X_i = \operatorname{E}[X]. \tag{3}$$ This is what we call a method of moments estimator, because we are matching sample moments (in this case, the first moment), to the theoretical moments. The LHS is just the sample mean, and that depends on the specific observations $X_i$ in our sample. The RHS is calculated from $(1)$:

$$\operatorname{E}[X] = (-3)\Pr[X = -3] + (0)\Pr[X = 0] + (3)\Pr[X = 3] = -1 + 3\theta + 1 - 3\theta = 0. \tag{4}$$

Well that isn't helpful. The expected value of $X$ is simply $0$, no matter what the value of $\theta$. So this does not result in a useful estimator, since it implies that the sample mean will not depend on $\theta$.

The solution is to match on higher moments: we calculate the second raw moment

$$\operatorname{E}[X^2] = (-3)^2 \Pr[X = -3] + (0)^2 \Pr[X = 0] + (3)^2 \Pr[X = 3] = 6(1-3\theta). \tag{5}$$

Now this is a quantity that does depend on $\theta$. So we equate the second raw sample moment to it and solve for $\theta$:

$$\frac{1}{n} \sum_{i=1}^n X_i^2 = \operatorname{E}[X^2] = 6(1-3\theta)$$

gives us the estimator $$\tilde \theta = \frac{1}{3} - \frac{1}{18n} \sum_{i=1}^n X_i^2. \tag{6}$$

Now we use this to compute a point estimate for the sample in $(2)$: $$\tilde\theta = \frac{1}{3} - \frac{6(-3)^2 + 2(0)^2 + 2(3)^2}{18(10)} = -\frac{1}{15}. \tag{7}$$

This represents our "guess" of the value of $\theta$ that was used to generate that sample. But $(6)$ is not the only way to estimate $\theta$: another approach is to choose the value that maximizes the likelihood function $$\mathcal L(\theta \mid x_1, \ldots, x_n) \propto \prod_{i=1}^n \Pr[X = x_i], \tag{8}$$ which in our case, is better written using auxiliary random variables. Let $Y_1, Y_2, Y_3$ represent the frequencies of outcomes in the sample that are equal to $-3$, $0$, and $3$, respectively. Then $Y_1 + Y_2 + Y_3 = n$, the sample size, and the joint probability of the sample can be written as

$$\prod_{i=1}^n \Pr[X = x_i] = \left(\frac{1}{3} - \theta\right)^{y_1} \left(\frac{1}{3} + 2\theta\right)^{y_2} \left(\frac{1}{3} - \theta\right)^{y_3}, \tag{9}$$

where $y_1, y_2, y_3$ count the number outcomes that equal $-3, 0, 3$ respectively in the sample $(x_1, \ldots, x_n)$. So we can make things a bit simpler and write the likelihood in terms of the $y_i$:

$$\mathcal L(\theta \mid y_1, y_2, y_3) \propto \left(\frac{1}{3} - \theta\right)^{y_1} \left(\frac{1}{3} + 2\theta\right)^{y_2} \left(\frac{1}{3} - \theta\right)^{y_3}. \tag{10}$$

To find the choice of $\theta$ that maximizes $\mathcal L$, we calculate the critical points of the log-likelihood:

$$\begin{align} 0 = \frac{\partial \ell}{\partial \theta} &= \frac{\partial}{\partial \theta}\left[ (y_1 + y_3) \log \left(\frac{1}{3} - \theta \right) + y_2 \log \left(\frac{1}{3} + 2\theta\right) \right] \\ &= -\frac{y_1 + y_3}{\frac{1}{3} - \theta} + \frac{2y_2}{\frac{1}{3} + 2\theta}, \tag{11} \end{align}$$

and I leave it as an exercise for you to show that the solution to $(11)$ is $$\hat \theta = \frac{y_2}{2n} - \frac{1}{6}. \tag{12}$$

Interestingly, what this suggests is that all you actually need to know for the maximum likelihood estimator is the number of zeroes in the sample. So for the sample $(2)$, the MLE is $$\hat \theta = \frac{2}{2(10)} - \frac{1}{6} = -\frac{1}{15}. \tag{13}$$ This happens to be equivalent to the MOM estimator. But is this always true, for any sample? How would you prove or disprove it?


Regarding bias and consistency, these I have left to you as a further exercise, as you have already been given the correct answers for the MOM and MLE estimators.

0
On
  1. You wrote under "Moments" what should be the maximum likelihood approach and conversely.

  2. In both cases, the estimator should be a function of the sample hence your approach cannot work.

  3. However, you computed correctly the mean and variance. If $\widehat{\sigma^2}$ denotes the empirical variance, using $DX=6-18\theta$, you get the estimator $\widehat{\theta}=(\widehat{\sigma^2}-6)/18$.

  4. The likelihood is given by $(1/3-\theta)^{k(x_1,\dots,x_n)}(1/3+2\theta)^{n-k(x_1,\dots,x_n)}$, where $k(x_1,\dots,x_n)$ is number of $i$ such that $x_i\in\{-3,3\}$. Therefore, you have to optimize a polynomial function of $\theta$.