Is it the sampling process, not the population distribution, that makes the CLT central limit theorem work?

26 Views Asked by At

I find it a bit "mathemagical" that one can use a sample to estimate the parameters of the assumed normal distribution of the true population, even if the sample is not at all close to normally distributed!

So, we use the sample data only to fill in the few parameters of a Student's t-function: The sample mean, the standard deviation and the number of observations (adjusted for the degrees of freedom used up). Then we ignore the actual sample data from there on. It all being reduced to 3 figures. Even if the sample's frequency histogram shows a ragged pack of bars here and there. We instead use the beautifully symmetric curve of the t-distribution to actually test our hypothesis.

I heard today, from a professional (not academic) source that this is because THE SAMPLING PROCESS is assumed to be normally distributed. So it doesn't relate to how the actual data in the population is distributed.

I do see how the sampling process can be reliably randomized and that it will then follow the normal distribution with respect to what observations are picked up by the sample. But I wish to see that line of reasoning developed.

How come the true distribution of the population doesn't matter, but only the sampling process? And why then is sometimes for example the exponential distribution used instead? Not because of using a different sampling process, I suppose.