Fundamental question on test for normal distribution

317 Views Asked by At

I have a fundamental question on statistical tests, particularly tests for normal distribution. As I understand statistical tests in general, they have a null hypothesis $H_0$ (e.g. the samples were drawn from a normal distribution) and an alternative hypothesis $H_1$ (e.g. the samples were not drawn from a normal distribution). If the test is significant ($p < p_\alpha$) one can reject $H_0$ and assume that $H_1$ is true. However, if the test is not significant, one can not automatically assume that $H_0$ is true.

Now, all tests for normal distribution that I read about have a $H_0$ that the samples were drawn from a normal distribution. Hence, the only thing you can do with these tests is to assume that the samples were not drawn from a normal distribution if the test is significant. You can't assume that the samples are drawn from a normal distribution if the test is not significant. But that's what everybody seems to be doing.

Is there anything fundamentally wrong with my understanding of statistical tests? How can I "prove" that a given sample was drawn from a normal distribution?

2

There are 2 best solutions below

4
On

When the test statistic falls in the significant region, we reject the null hypothesis that the samples are drawn from a Gaussian distribution.

When the test statistic does not fall in the significant region, we fail to reject the null hypothesis that samples are drawn from a Gaussian distribution - we cannot say anything more.

Even this 'failing to reject' has a lot of use cases.

For e.g. in the output of OLS regression where $\beta_i$'s (the coefficients) are estimated, we see the p-values reported. These p-values are nothing but they are rejecting the null hypothesis that $\beta_i = 0$. Essentially, they are reporting a significant linear relationship.

With regards to your question, your understanding is correct.

One can never be 100% sure that a sample comes from a Gaussian distribution. We can only give a confidence level. Typically, more the sample size, better the confidence level.

However, there are variations across each of these statistical tests readily available in statistical software. Certain tests are 'good' for detecting certain aspects based on their statistical power.

8
On

Your can assume that both hypothesis belong to same distribution or population because both hypothesis can be used to approximate that distribution.

An example - A company checks that mean water in their manufactured water bottle is about 997 ml with variance k.But after they replaced the manufacturing machine with another same machine, they note that the mean water in a water bottle is about 995ml with variance k.You may assume that both population folows normal distribution.Test at 5% level of significance whether new mean can be considered.

In this you might end with conclusion that new mean can be conidered because $H_a$ fails.These both contain means that are very close so one might assume that they come from same distribution if population size of both distribution is significantly large.

To test whether both hypothesis come from same distribution you can use chi sqaured test. You can find more about chi squared test https://en.wikipedia.org/wiki/Chi-squared_test