Rigorous derivation of MLE in two-parameter case when second derivative test fails

215 Views Asked by At

I have a query regarding derivation of Maximum Likelihood Estimator (MLE) when there are multiple parameters to estimate simultaneously in a distribution with support depending on parameter, in particular the two-parameter case. I am taking a routine exercise as an example. Nothing special in this specific problem as the setup is all too familiar:

Suppose $(X_1,X_2,\ldots,X_n)$ is a random sample drawn from a distribution with pmf

$$f(x;\alpha,\theta)=(1-\theta)\theta^{x-\alpha}\mathbf1_{x\,\in\,\{\alpha,\alpha+1,\ldots\}}$$

, where both $\alpha\in\mathbb R$ and $\theta\in(0,1)$ are unknown. I am to derive MLE of the parameter $(\alpha,\theta)$.

The likelihood function given the sample $(x_1,x_2,\ldots,x_n)$ is

\begin{align} L(\alpha,\theta)&=(1-\theta)^n\theta^{\sum_{i=1}^n x_i-n\alpha}\,\mathbf1_{x_1,x_2,\ldots,x_n\,\in\,\{\alpha,\alpha+1,\ldots\}} \\\\&=(1-\theta)^n\theta^{\sum_{i=1}^n x_i-n\alpha}\,\mathbf1_{x_{(1)}\,\ge\,\alpha}\qquad\qquad\quad\qquad,\, \alpha\in\mathbb R\,,\,\theta\in(0,1) \end{align}

, where $x_{(1)}=\min\{x_1,x_2,\ldots,x_n\}$ is the smallest order statistic.

It is clear from the Factorization theorem that a sufficient statistic for $(\alpha,\theta)$ is $$T(X_1,X_2,\ldots,X_n)=\left(X_{(1)},\sum_{i=1}^n X_i\right)$$

So if our MLE is unique, we expect it to be a function of $T$.

Now keeping $\theta$ fixed, the likelihood function is of the form

$$L(\alpha,\theta)\propto \frac{\mathbf1_{x_{(1)}\,\ge\,\alpha}}{\theta^{n\alpha}}$$

Since $\theta\in(0,1)$, I think I can say that $L$ is an increasing function of $\alpha$ for a fixed $\theta$.

This would imply that for constant value of $\theta$, the likelihood is maximized for the maximum value of $\alpha$ subject to the constraint $\alpha\le x_{(1)}$. So the MLE of $\alpha$ should be

$$\hat\alpha(X_1,X_2,\ldots,X_n)=X_{(1)}$$

Now I can differentiate the likelihood to obtain MLE of $\theta$ in terms of the MLE of $\alpha$.

The log-likelihood is

\begin{align} \ell(\alpha,\theta)&=n\ln(1-\theta)+\left(\sum_{i=1}^n x_i-n\alpha\right)\ln\theta+\ln(\mathbf1_{x_{(1)}\,\ge\,\alpha}) \\\implies\frac{\partial\ell}{\partial\theta}&=\frac{-n}{1-\theta}+\frac{1}{\theta}\left(\sum_{i=1}^n x_i-n\alpha\right) \end{align}

And

\begin{align} \frac{\partial\ell}{\partial\theta}=0&\iff \frac{1}{\theta}\left(\sum_{i=1}^n x_i-n\alpha\right)=\frac{n}{1-\theta} \\\\&\iff \theta=\frac{\sum\limits_{i=1}^n x_i-n\alpha}{n+\sum\limits_{i=1}^n x_i-n\alpha}=\frac{\bar x-\alpha}{1+\bar x-\alpha} \end{align}

So the MLE of $\theta$ could be

$$\hat\theta(X_1,X_2,\ldots,X_n)=\frac{\overline X-\hat\alpha}{1+\overline X-\hat\alpha}$$

Finally, I have to conclude that $(\hat\alpha,\hat\theta)$ is the MLE of $(\alpha,\theta)$.

Since this is a maximization problem in two variables, to solve the problem rigorously, I should be verifying that the following actually holds for all $(\alpha,\theta)$:

$$\ell(\hat\alpha,\hat\theta)\ge \ell(\alpha,\theta)\tag{1}$$

Am I correct in saying that I do not have the second partial derivative test as an option here? Because the partial derivative of $\ell(\theta,\alpha)$ with respect to $\alpha$ does not exist at $\alpha=x_{(1)}$, my Hessian matrix is not totally differentiable. So I cannot 'prove' that the Hessian is negative definite at $(\hat\alpha,\hat\theta)$ to conclude that $(\hat\alpha,\hat\theta)$ is the point of global maxima.

My question is:

To derive the MLE rigorously, is it enough if I stop after finding $\hat\alpha$ and $\hat\theta$ separately and just state that $(\hat\alpha,\hat\theta)$ is the MLE of $(\alpha,\theta)$, or do I have to show that $(1)$ holds? And do I have other calculus tools at hand to solve the problem efficiently?

A very common example where a similar query arises is the two-parameter exponential distribution. Also see this post where apparently it is shown that the second derivative test can still be applied.