How to compute the statistical power of this normality test in R language

107 Views Asked by At

I've written the following code in R program:

n=10
delta <- vector()
for (i in 1:10000){
x <- rt(n, 1)
x <- sort(x)
z <- pnorm(x, mean(x), sd(x))
delta[i] <- sum(abs(z-seq(1,n)/n))/sum(pmax(z, seq(1,n)/n))
}

I want to compute the statistical power of the proposed normality test via simulation studies with 10000 iterations, assuming different sample sizes and various alternative distributions. For a fixed sample size n and for a pre-assumed alternative distribution, (like t-student with degree of freedom 1), the power of the test is computed as the percentage of samples out of 10000 Monte-Carlo simulated samples from the pre-assumed distribution which were rejected to follow normal distribution based on the used test. I was wondering if someone could tell me how I can compute the statistical power of this normality test in R?

1

There are 1 best solutions below

0
On BEST ANSWER

Suppose a test of normality is performed at the 5% level. This means that it will reject (have P-value below 5%) with probability $0.05$ for a random sample from a normal population. (A test at significance level 5%.)

One hopes that the test rejects with larger probability for a random sample from a non-normal distribution (its power).

No normality test has good power for all non-normal distributions, but the Shapio-Wilk test works reasonably well for a wide variety of non-normal distributions. This test is implemented in R as shapiro.test.

Below I show how to use R to find the power of the S-W test for samples of size $n=10$ from Student's t distribution with $\nu = 1$ degrees of freedom (a standard Cauchy distribution).

For one such sample, the S-W test in R looks like this. Notice that S-W does not reject for this particular sample.

set.seed(826)
shapiro.test(rt(10, 1))

        Shapiro-Wilk normality test

data:  rt(10, 1)
W = 0.96613, p-value = 0.8529

one can use $-notation to get just the P-value:

set.seed(826);  shapiro.test(rt(10,1))$p.val
[1] 0.8528541

In your question, you propose using $10^4$ iterations, but I think using $10^5$ iterations is more likely to give useful results.

Here is a simulation in R to approximate the power of the S-W normality test against the alternative that the sample is from $\mathsf{T}(\nu = 1).$ The power is about $0.5912\pm 0.0031.$

set.seed(2021)
pv = replicate(10^5, shapiro.test(rt(10,1))$p.val)
mean(pv <= 0.05)  # power
[1] 0.59117
2*sd(pv <= 0.05)/sqrt(10^5)
[1] 0.003109279

A run with samples of size $n = 20$ (not shown) gave about 87% power. Thus the S-W normality test is not so good at rejecting Cauchy samples of size 10, and considerably better at rejecting such samples of size 20.

Notes: (1) The numeric vector pv contains $100\,000$ P-values. The logical vector pv <= 0.05 contains $100\,000$ TRUEs and FALSEs (True upon rejection). Its mean is its proportion of TRUEs.

The last line of code shows an approximate 95% margin of simulation error; we are getting almost 2-place accuracy from $10^5$ iterations. (The asymptotic Wald CI is used; it works well for $100,000$ iterations.)

(2) While this style of simulation using $-notation ensures that I am using exactly the version of the Shapiro-Wilk test that is implemented in R, and the code is easy to write. However, this is not the fastest possible code because R formats the entire test output at each iteration, while we use only the P-value.

(3) If you can write a function that returns the P-value of your test of normality, then you can use analogous R code (with replicate) to explore its power.