Interpretation of the p-value and the test-statistic W of the Shapiro.test in R

1.8k Views Asked by At

Shapiro-Wilk normality test

data: Part1 W = 0.14846, p-value = 6.478e-16

Shapiro-Wilk normality test

data: Part2 W = 0.47978, p-value < 2.2e-16

Shapiro-Wilk normality test

data: Part3 W = 0.8033, p-value = 5.043e-09

For the case Part1, Since p-value is equal to approximately 0 and the value of test statistic is W =0.14846, we definitely say that The distiribution of this dataset is not from normal distiribution.

For the case Part2, Since p-value is equal to approximately 0 and the value of test statistic is W =0.47978, we may say that it is not from normal distiribution. But Part2 is better than Part1 case.

For the last one, Since p-value is approximately zero and the test statistic W =1 is closer than the others, we can conclude that it's distiribution is from normal distiribution.

I interpret as you seen on the above. But I think, There is some wrong comment. Can you comment about it ?

1

There are 1 best solutions below

1
On BEST ANSWER

All three of these P-values are very much smaller than 0.001, so none of the samples seem to be normal. (The null hypothesis of normality is rejected at any reasonable level of significance.) But some cautionary notes are in order for the practical use of such tests:

(1) Shapiro-Wilk often rejects for a large nearly-normal sample. If you have a large sample from a distribution that is nearly, but not exactly normal, you may get a small P-value indicating that the population is not exactly normal.

Example: 5000 values are randomly sampled from $\mathsf{Norm}(\mu=100,\sigma=10),$ and values above 125 (23 of them) are not recorded, leaving a slightly short right tail. (With 5000 observations there is enough information to detect even slight departures from normality.) The Shapiro-Wilk test rejects the null hypothesis that the data are normal, but for many practical purposes the data might be considered as normal.

set.seed(1234)  # for reproducibility
x = rnorm(5000, 100, 10);  y = x[x < 125] 
shapiro.test(y)

        Shapiro-Wilk normality test

data:  y
W = 0.99836, p-value = 4.897e-05

hist(y, prob=T, col="skyblue2", ylim=c(65, 135))
curve(dnorm(x, 100, 10), add=T, lwd=2, col="red")

enter image description here

(2) Shapiro-Wilk often fails to reject for a small non-normal sample. If you have a very small sample, the test may not be able to reject the null hypothesis of normality, even if the population from which the sample was taken is not normal.

Example: Ten observations are randomly sampled from $\mathsf{Beta}(2,2),$ but the Shapiro-Wilk test fails to reject normality. (A sample of size 10 is simply too small to distinguish whether or not the data come from a normal population.)

set.seed(4321)  # for reproducibility
x = rbeta(10, 2, 2)
shapiro.test(x)

        Shapiro-Wilk normality test

data:  x
W = 0.9305, p-value = 0.4528

hist(x, prob=T, col="skyblue2", xlim=c(0,1))
curve(dbeta(x, 2, 2), add=T, lwd=2, col="red")

enter image description here