Estimation of the Kolmogorov test power [Rstudio].

47 Views Asked by At

Let $x$ be a 20-element sample from the T-Student distribution with 4 degree of freedom. I want to do a simulation to estimate the power of the Kolmogorov test for $H_0=F\sim N(0,1)$ if $\alpha=0.01$.

My idea:

n=1000 #number of simulations
alpha=0.1
m=c()
for (i in 1:n){
  x=rt(20,4)
  pV=ks.test(x=x,y="pnorm")$p.value
  m[i]=pV
}
power=m[m<alpha]
print(mean(power))#result

Is this way correct?

2

There are 2 best solutions below

1
On

I can't help you with the particular programming implementation, but the statistical implementation is simple.

  1. Generate a random sample of $n = 20$ from a Student $t$ distribution with $\nu = 4$ degrees of freedom.
  2. Compute the KS test statistic on this sample and the resulting $p$-value.
  3. Count the number of $p$-values that are less than $\alpha$.

The power of the test is then the number of correct rejections of the null divided by the number of simulations; i.e., the number obtained in Step 3 above divided by the total number of samples generated.

My implementation in Mathematica is one line:

Length[Select[ParallelTable[KolmogorovSmirnovTest[
    RandomVariate[StudentTDistribution[4], 20]], {10^3}], # < 0.01 &]]/10.^3

This is for $10^3$ simulations of a $t$-distribution with $\nu = 4$ degrees of freedom and $n = 20$ realizations per sample and $\alpha = 0.01$. The resulting power is small, only about $6.6\%$. Among $10^6$ simulations, the power was about $0.06564$. This suggests that the Student $t$ distribution with $\nu = 4$ is difficult for the KS test to distinguish from a normal distribution at a sample size of $n = 20$.

If you increase the sample size, you would expect the power to increase. At $n = 200$, with all else the same, the power increases to about $54.7\%$. At $\alpha = 0.05$, we have a power of about $74.5\%$.

0
On

You are on the right track until the very end. What you call power isn't really power.

You are isolating the P-values that are smaller than $\alpha = 0.1$ and then averaging them. That's not exactly what you want.

You want the probability of rejection. That is the probability that the P-value is less than or equal to $0.1.$

Here is my suggested code:

set.seed(602)
m = 10^5;  n = 20
pv = replicate(m, ks.test(rt(n, 4), "pnorm")$p.val )
mean(pv <= .1)
[1] 0.12209
2*sd(pv <= .01)/sqrt(m)
[1] 0.0007422931 # aprx 95% margin of sim err

So the P-value is about $0.1221 \pm 0.0007.$

The vector pv <= .1 is a logical vector of TRUEs and FALSEs. Its mean is its proportion of TRUEs.


To show that your code doesn't give correct answers I made the data standard normal, so the rejection rate should be $0.1.$ But I get about $0.05$ instead.

n=1000 #number of simulations
alpha=0.1
m=c()
for (i in 1:n){
 x=rnorm(20)
 pV=ks.test(x=x,y="pnorm")$p.value
 m[i]=pV
}
power=m[m<alpha]
print(mean(power))#result
[1] 0.05010002