Why are we always assuming normality?

263 Views Asked by At

For a great number of statistical testing (ANOVA or discriminant analysis for example), we suppose that the variables follow a normal law. How many times did I see this supposition ?

But I recently learnt that we could calculate the Kurtosis (the measure of the "tailedness") and the Skewness of the variables and check if they both followed the normal value (1 for the kurtosis, and 0 for the skewness). But before that, I always made the assumption that my variables followed a normal law, without testing it.

Is this because I am a student, and I just have to apply the theorem and calculus I learn during my scholartime, or is it just because we always assume our variables follow normal law?

Furthermore, one of my professor just told me that with a great amount of data (~ >= 1 million), we just assume that the variables and its moment estimators follow a normal law. But when I calculated my kurtosis and skewness, I found enormous values, sometimes greater than 100. To me, it means my variables DOESN'T follow a normal law. So why my professor assume it does?

I apology for my poor English, and I hope my question is clear nonetheless.

1

There are 1 best solutions below

2
On BEST ANSWER

There are multiple questions that you ask. In general, assuming normality is plain wrong, a random variable could be whatever (exponential, Pareto, beta, gamma). In the same sense if you have some data it could be that they come from a particular distribution or that they are "random" or deterministic.

About the skewness and kurtosis, you can calculate them and see if they match those of a normal distribution, be aware this matching does not mean that your data come from a normal distribution necessarily. Finally, any amount of data does not allow you to assume normality (why would it?), most probably your professor was talking about the Central limit theorem which says something totally different.

On a high level the Central limit theorem says : If you have a population with mean $\mu$ and standard deviation $\sigma$ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.