Sampling distribution of sample trimmed (truncated) mean

1.1k Views Asked by At

It is elementary probability theory that the sample mean of an i.i.d. sample follows normal distribution, if the background distribution is normal. But what about the trimmed mean? Is there any result on its distribution for an i.i.d. sample of size $n$? (For normal or general population distribution.)

My only idea is to use the results for the distribution of order statistics (summing them, taking their non-independence into account), but it seems exceedingly complicated, perhaps there is an easier way...

EDIT (2021-08-12): See the answer here.

1

There are 1 best solutions below

8
On

You are correct that the distribution theory is of an advanced nature. An important paper on this topic is Stephen M. Stigler, Annals of Statistics, Vol. 1, No. 3 (1973); an open version is here. This other Stigler paper is also relevant.

However, in general terms, if the the population distribution is normal--or any other continuous unimodal distribution that has a mean and variance and that decreases monotonically towards its tail(s)--the trimmed mean converges to normal as the sample size increases. (The condition I have given can be weakened, but it covers a vast majority of distributions used in practical modeling.)

Various versions of the trimmed mean eliminate different percentages of their observations from both tails. A common choice is the 5% trimmed mean that cuts 5% from each tail and averages the 'middle' 90% of the data. As trimming approaches 50% from each tail, the trimmed mean becomes the median.

The degree of trimming may affect the rate at which normality is reached, but the tendency is nevertheless towards normal. There is even a 'central limit theorem' for medians.

When there is symmetry all around (symmetrical population distribution and trimming the same percentage from each tail) the expectation of the trimmed mean is the same as the population mean. The variance depends on percentage of trimming, shape of the population distribution and sample size.

Because of the messiness of the distribution theory, it is common in practice to do simulation studies to determine the distribution of the trimmed mean in a particular situation.

For example, suppose the parent distribution is a mixture of 90% $Norm(100, \sigma = 10)$ and 10% $Norm(130, \sigma=50),$ and we have a sample of size $n=20.$

The usual terminology is that the population with mean 100 has been "10% contaminated" by observations with a larger mean and standard deviation. Ten percent is a fairly high level of contamination. The contaminated distribution is far from normal, with very heavy tails, and right skewness.

A simple simulation with 100,000 samples of size 20 shows that $E(\bar X) = 103$ and $SD(\bar X) = 18.7$ for the original data. For the trimmed data (denoted by $Y$) we have $E(\bar Y) = 101.6$ and $SD(\bar Y) = 11.3.$ Histograms of both $\bar X$ and $\bar Y$ are "nearly" normal, even with the relatively small sample size $n = 20$, but both are slightly skewed to the right.

Trimming tends to put $\bar Y$ closer to the mean 100 of the 'main' population than is true for the untrimmed mean. Similarly, trimming has eliminated some, but not all of the 'excess' standard deviation due to contamination. We see that 5% trimming has partly mitigated the effects of serious contamination, but hardly completely.