Poisson parameter interval estimation vs. the CLT: why not simple?

152 Views Asked by At

I've never been much of a mathematician, and I'm now trying to catch up on some things I should have studied properly many years ago. So forgive me if the question is naive!

I'm relearning statistics, and in particular estimations and hypothesis testing. I have one doubt about the application for the central limit theorem to estimate the mean of a population. Here's my thinking:

  1. The CLT applies to any distribution. We can estimate the mean, and the estimator itself follows a normal distribution $N(\mu,\sigma^2/n)$. This variance on the estimator is approximated by $s^2/n$ where $s$ is the standard deviation of the samples. And I can use the mean of the samples to approximate the underlying population's mean.
  2. If I have a few samples that I believe follows a Poisson distribution, such as the 27 values $2, 4, 1, 4, 0, 1, 3, 2, 1, 4, 1, 1, 0, 1, 2, 0, 2, 2, 0, 0, 3, 1, 0, 3, 4, 3, 2$, then I can estimate its parameter as the average of these values ($\hat\lambda= 1.74$) and use the approximation of the estimator's variance to create a 95% confidence interval around that. So my interval would be $\hat\lambda \pm 1.96*s/\sqrt n$, so $1.74 \pm 1.96*0.265$, so a range from $1.22$ to $2.26$.

However I never see such a simple process applied anywhere I look. Instead I see things I haven't looked into yet such as MLE etc. Which definitely look much more complex than this. But why? Is the above incorrect? Why would the CLT fail to deliver, since I'm only looking at estimating a mean? Am I misinterpreting what the mean is, here? Or is it just that it's not very precise and the other methods give a smaller interval?

As you can see from my very unclear explanations, I'm at this stage where I understand a few things but it's quite superficial. If someone takes the time to answer, do not hesitate to do so in a "for dummies" style... I'll be super grateful.

Note: I might be particularly confused by the Poisson distribution because I'm trying to catch up on the topic from the Fundamentals of Biostatistics (Rosner) book, in which the Poisson parameter is $\mu=\lambda t$, rather than the typical $\lambda$ on its own that I see when searching the web (in which case $\mu=\lambda$). This inclusion or not of the time element as a multiplier confuses a little bit the notion of what in this is really the mean and the variance.

Thanks!

P.

1

There are 1 best solutions below

3
On

Either of the two methods you describe is correct as far as it goes. If you do many statistical analyses of this type, on average you expect at least $95\%$ of them to correctly give you an interval containing the true mean.

(For the CLT, the story is a bit more complicated; it does not apply to any distribution. It applies to the Poisson distribution, though, and better than usual, because the Poisson distribution is approimately normal when $\lambda$ is large. So we can treat it as a way to simplify some calculations in this case.)

MLE and other Bayesian estimators are more complicated and more precise answers to more complicated questions. In particular, the posterior distribution you get from this method can be used to answer any kind of question along the lines of "what is the probability that the rate $\lambda$ satisfies $X$?"

In addition to much more complicated calculations, things are more complicated because you need more input to make it work: a prior distribution for how likely you thought values of $\lambda$ are before doing the experiment. (But you can always use an uninformative prior if you have no idea, which is what MLE does.) This, too, is valuable: it makes it easy to combine several experiments, just by taking the output of one as input into the other.