Sample mean converging to normal much faster than expected.

101 Views Asked by At

I am taking a non-normal distribution (Poisson, Exponential or Uniform etc.) and I simulate thousands of experiments for small sample sizes ($n=1,...,10$). I calculate the 95%-confidence interval each time, using: $$I_{0.95}=\bigg(\hat{\mu}+z_{0.025}\frac{\sigma}{\sqrt{n}}, \ \ \hat{\mu}-z_{0.025}\frac{\sigma}{\sqrt{n}}\bigg)$$ where $\hat{\mu}$ is my estimate for the mean (sample mean). I am also always assuming $\sigma$ is known (I am just taking it from the distribution I chose at the beginning).

Now, I have noticed a strange behaviour, namely, the estimates for confidece intervals are correct approximately 95% of them for whatever size of a sample. I thought that we should get a value around 95% only after, say, $n>30$.

Is this because the variance is known?

Below is a graph for $n=2,3,...,100$, each time 10,000 simulations. Uniform distribution. enter image description here

The below picture is a reference to @grand_chat 's comment.

enter image description here

2

There are 2 best solutions below

7
On BEST ANSWER

Try computing confidence intervals for samples from distributions that deviate more from normality. For example, try generating observations from a Poisson distribution with $\lambda=.001$. You'll find that your $z$-theory intervals will not have the advertised coverage.

In general, you'll find that the normal theory works pretty well if the distribution you're sampling from is reasonably symmetric, so aim for samples from non-symmetric distributions, or distributions that are almost a point mass, such as Poisson with tiny $\lambda$. Keep away from the uniform distribution; the normal approximation kicks in pretty quickly no matter how you scale it.

Knowing the variance takes some of the 'noise' out of the confidence interval. So yes, this will improve coverage over using an estimator for the standard deviation.

2
On

Since you are dealing with non-normal distributions, shouldn't you be using tests for non-normally distributed data? Also, when the sample size is less than 30, you would usually conduct a t-test. In this case, the t-test equivalents for non-normally distributed data are the following:Mann-Whitney test, Mood’s median test, and Kruskal-Wallis test.