Estimate a parameter that is not in a PDF, makes sense?

47 Views Asked by At

I have been reading some posts about the proof of the invariance of MLE because I did not fully understood the proof given in Statistical inference by Casella Berger in the page 320.

My doubt is the following... if we want to estimate a parameter should not that parameter be in model? The Maximum Likelihood Estimators method assumes that the parameter is explicity in the model.

Here is the start of the proof and the transformation is $\eta=\tau(\theta)$.

Here

To be precise my question is, shouldn't

$$\text{L(}\eta|X)$$

be the function to maximize instead of the function $\text{L(}\theta|X)$?

1

There are 1 best solutions below

0
On

The parameter $\theta$ (which may be a scalar or vector), for the purposes of discussion, completely specifies the model in question. For instance, we might say $$\theta = (\mu, \sigma^2)$$ for a normal distribution with unknown mean $\mu$ and variance $\sigma^2$. Or, if the variance is known, $\theta = \mu$.

Then any function of that parameter, say $\eta = \tau(\theta)$, might not be invertible, but it is still "in the model" in the sense that inference on $\eta$ is deterministically related to inference on $\theta$. For instance, we could be interested in a discretization of $\theta$: $$\eta = \lfloor \theta \rfloor,$$ where $\tau = \lfloor \cdot \rfloor$ is the greatest integer function. It's not invertible--if I told you I estimated $\hat\eta = 3$, all that tells you is that the corresponding estimate of $\theta$ is $\hat \theta \in [3, 4)$. It is not required that we estimate $\theta$ itself. We estimate what is of interest.

As for your other question, it must be $L^*(\eta \mid X)$, because this is the (induced) likelihood of $\eta$ with respect to the sample and is literally what we are seeking to maximize.

Let's put it another way. Say that the parameter of interest was $\theta = \mu$ for a normal random variable with known variance $\sigma^2 = 1$. We know that the sample mean is the MLE; $$\hat \theta = \bar X$$ maximizes the likelihood $L(\theta \mid X)$.

But what if I told you that the mean parameter $\mu$ is actually a transformation of an underlying parameter $\psi$, which I call "angle," and obeys the relationship $$\mu = \tan \psi?$$ Maybe you've never heard of such a parametrization, and all this time, you've been doing inference on $\mu$ directly. You've constructed your likelihoods based on $\mu$. Nothing was amiss. And when you look at $L$, you're not looking at $L(\psi \mid X)$.

So why then, if we are talking about $\eta$, are you thinking the likelihood to be maximized in Theorem 7.2.10 should be $L(\theta \mid X)$ rather than $L^*(\eta \mid X)$? All I have done in my example above is reason in the other direction--i.e., $\mu$ is $\eta$ and $\theta$ is $\psi$ in my example.