Tests for normal distribution

61 Views Asked by At

Given some data, is there any test to determine if the data fits a normal distribution ( the mean and variance are not mentioned )

1

There are 1 best solutions below

0
On

The Shapiro-Wilk test is a formal test of normality-in-general for a random sampled data - without reference to specific numerical values of the mean $\mu$ or variance $\sigma^2.$

For example, vector x has observations from a nomral population and vector y has observations from an exponential population.

set.seed(2020)
x = rnorm(100, 1, 1)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -2.039   0.438   1.120   1.109   1.739   4.202 
shapiro.test(x)

        Shapiro-Wilk normality test

data:  x
W = 0.98906, p-value = 0.5895

The null hypothesis that the population is normal is not rejected at the 5% level: P-value = $0.59 > 0.05 = 5\%.$

y = rexp(100, 1)
summary(y)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.008152 0.329992 0.810329 1.175030 1.679118 4.713232 
shapiro.test(y)

        Shapiro-Wilk normality test

data:  y
 W = 0.85789, p-value = 2.346e-08

The null hypothesis that the population is normal is rejected at the 5% level: P-value $\approx 0 < 0.05 = 5\%.$

Also, as suggested in the Comment by @MatthewPilling one can look at a normal probability plot (Q-Q plot), in which normal data should roughly follow a straight line. (One should not be too fussy about fit to a straight line for the lower and highest values in the sample.

Here are Q-Q plots of vectors x (left panel) and y.

par(mfrow=c(1,2))
 qqnorm(x); qqline(x, col="darkgreen")
 qqnorm(y); qqline(y, col="darkgreen")
par(mfrow=c(1,1))

enter image description here

With samples as large as $n = 100$ is is usually possible to distinguish samples from normal populations for samples from non-normal populations. But for smaller samples, the distinction may not be so clear.

Here is a repetition of the above, but with samples of size $n = 15.$

set.seed(1213)
x = rnorm(15, 1, 1)
shapiro.test(x)$p.val
[1] 0.02057266
y = rexp(15, 1)
shapiro.test(y)$p.val
[1] 0.002650913

The Shapiro-Wilk test mistakenly rejects normality of x at the 5% level (moderately close call), correctly rejects normality of y with a very small P-value.

Normal probability plots: We might be willing to excuse the ragged plot at left for x as "perhaps roughly linear", but the plot at right 'y' is clearly not linear.

enter image description here

Boxplots show some left-skewness for x (at left), but three high high outliers out of $n=15$ for y.

For samples as small as $n = 15,$ one can often only speculated about normality.

boxplot(x, y, col="skyblue2", pch=19, names=T)

enter image description here