Apologies for the basic question this is really not my area at all but I’m trying to help a friend out.
Whilst reading the Wikipedia page for the Shapiro-Wilk test I came across the following: “As with most statistical tests, the test may be statistically significant from a normal distribution in any large samples. Thus a Q–Q plot is useful for verification in addition to the test”
I interpret this to mean that if we sampled a large amount of data from what was in fact a Normal population, the test may in fact reject the null hypothesis that the parent population was Normal. Is this interpretation correct?
If so, why is this the case? I thought in general larger samples gave better testing?! Any intuition on this would be very much appreciated.
Let $SW$ be the Shapiro-Wilk statistic, and $P = F(SW)$ be its p.value. As $$ F(P \le p) = F(F(SW) \le p) = F( SW \le F^{-1}(p)) = F(F^{-1}(p)) = p, $$ hence $F(SW) \sim U[0,1]$. Namely, under $H_0$ the p.value is distributed uniformly on $[0,1]$ and thus, as you reject the null hypothesis where $p.value < 0.05$, you have probability of $0.05$ to falsely reject $H_0$ (given that $H_0$ is true). This is true regardless of the sample size.
The following simulation illustrates the distribution of $10^4$ SW's p.values for sample size of $1000$ each where the data comes from $\mathcal{N}(0,1)$.