What happens when we use a nonparametric procedure (one that does not depend on the normal distribution) on a data which is normally distributed?

42 Views Asked by At

If most (all?) parametric procedures have a nonparametric equivalent (one that doesn’t depend on the normal distribution) then why don't just stick with nonparametric procedures for all data? That would save us the need to check if data is normally distributed in the first place. Is it a good idea?

1

There are 1 best solutions below

0
On BEST ANSWER

For normal-theory tests, the fact that the population is normal provides specific information that a corresponding non-parametric test cannot use. Usually, this means that the normal theory test is more powerful for normal data than is the corresponding nonparametric test.

Power computations for normal data. Consider a normal sample of size $n = 50.$ We wish to test $H_0: \mu = 0$ against $H_a: \mu > 0$ at the 5% level of significance. Also, suppose $\sigma=2.$ We are interested in the power against the specific alternative $\mu_a = 1.$ That is the probability of rejecting $H_0$ if the true value of $\mu = 1.$ (Whether we know $\sigma$ or not, the power depends on $\sigma.$ In practical situations, one often needs to guess the value of $\sigma$ in order to estimate power.)

Power for t test: First, consider using the one-sample t test. There are formulas for the power of a t test, but let's approximate power using a simulation. We consider many datasets $(m=100\,000$ of them) of size 50 from $\mathsf{Norm}(\mu=1, \sigma-2),$ test each sample to find the resulting P-value.

We reject $H_0$ at the 5% level if the P-value is less than 5%. So we see the percentage of times we reject $H_0.$ The answer is that the proportion of rejections is $0.934$ so the power is 94.4%. (Simulations and computations use R.)

set.seed(702)
pv = replicate(10^5, t.test(rnorm(50, 1, 2))$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.93426

Power for Wilcoxon test. Now, what power do we get if we use the one-sample Wilcoxon signed-rank test instead of the t test. The power decreases a little to 92.2%. The Wilcoxon test considers the center of the distribution to be the median, but for a normal distribution, the mean and median are the same, so it is fair to compare the the two P-values. (Exact power computations for Wilcoxon tests are difficult, so simulation may be the easiest way to get power.)

set.seed(702)
pv = replicate(10^5, wilcox.test(rnorm(50, 1, 2))$p.val)
pwr = mean(pv <= 0.05);  pwr
[1] 0.92181

Smaller samples: For smaller samples, the Wilcoxon test can have a greater power disadvantage. Consider $n = 10$ and power against the alternative $\mu_a = 2.$ The t test has power 80.5% and the Wilcoxon test has power 78.6%.

set.seed(703)
pv = replicate(10^5, t.test(rnorm(10, 2, 2))$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.80475

set.seed(703)
pv = replicate(10^5, wilcox.test(rnorm(10, 2, 2))$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.78646

Finally, let look at sample size $n = 6$ and alternative $\mu_a = 4.$ Then the power is 97.0% for the t test and only 87.2% for the Wilcoxon test. For sample sizes 5 and below, the Wilcoxon test is essentially useless.

set.seed(703)
pv = replicate(10^5, t.test(rnorm(6, 4, 2))$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.97062

set.seed(703)
pv = replicate(10^5, wilcox.test(rnorm(6, 4, 2))$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.87164

Power for exponential data. By contrast, now let's look at a situation with non-normal data in which the Wilcoxon test has an advantage. Suppose we have a sample of size 15 from an exponential distribution and we want to test $H_0: \mu = 1$ against $H_a: \mu > 1,$ and get the power against the specific alternative $\mu_a = 2$ (which is rate 1/2).

Wrong significance level. The first problem with the t test, is that a test supposedly at the 5% level actually rejects with probability about 8.7% because for exponential data the t statistic doesn't have a t distribution.

set.seed(704)
pv = replicate(10^5, t.test(rexp(15, 1), mu=1)$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.08734

Poor power for t test. Then for an actual mean of 2, the t test rejects with probability only about 41%. So the t test doesn't work at all well for exponential samples of size 15 (or even much larger samples, for that matter).

set.seed(704)
pv = replicate(10^5, t.test(rexp(15, 1/2), mu=1)$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.41033

Better power for Wilcoxon test. For the Wilcoxon tests we need to use the corresponding median. If the exponential mean is 1, then the median is $-\log(1/2)=0.693.$ (For some reason, R uses mu for the hypothetical 'location' in a Wilcoxon test.)

set.seed(704)
pv = replicate(10^5, wilcox.test(rexp(15, 1/2), mu=0.693)$p.val)
pwr = mean(pv <= 0.05); pwr
[1] 0.70505