What are the implications of a uniform p-value distribution?

89 Views Asked by At

What does a uniform p-value distribution mean? It seems so many sources say different things.

I come from a physics background, so please explain using the most basic of statistics language what the implications are for my data if this is the p-value distribution.

I was reading from two websites (noted below) that seem to state two different things for a uniform p-value distribution. Is a uniform p-value distribution good?

p-value histogram

https://towardsdatascience.com/how-to-test-your-hypothesis-using-p-value-uniformity-test-e3a43fc9d1b6

http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

1

There are 1 best solutions below

2
On BEST ANSWER

If the test statistic is continuous, the test is exact, and the null hypothesis is true, then the P-value is uniformly distributed on the unit interval.

I will illustrate this, beginning with a sample of size $n=10$ from $\mathsf{Norm}(\mu,\sigma),$ where both parameters are unknown. We test $H_0: \mu = 0$ against $H_a: \mu \ne 0$ at level $\alpha = 0.05 = 5\%,$ using a one-sample t test. The test statistic is $T=\frac{\bar X - \mu_0}{S/\sqrt{n}}.$

Assuming $H_0$ to be true, $T \sim \mathsf{T}(\nu = n-1),$ Student's t distribution with $n-1$ degrees of freedom, where $\bar X$ is the sample mean and $S$ is the sample standard deviation. If the observed value of $T$ is $t,$ then the P-value is $P(|T| \ge t\,|\, H_0).$ The null hypothesis is rejected if the P-value is smaller that $0.05 = 5\%.$

For example, let specific data be as sampled in R below.

set.seed(422)
x = rnorm(10, 0, 1); x
[1] -0.2051078  1.7241359 -1.3088931  0.3872518 -1.9000603
[6]  1.0360439  0.8976141  0.3462825  0.6213729  0.1859934

a = mean(x);  a
[1] 0.1784633
s = sd(x);  s
[1] 1.084708

t = (a - 0)/(s/sqrt(10));  t
[1] 0.5202788

pv = pt(-t,9) + 1 - pt(t,9);  pv
[1] 0.6154243

These computations are performed and summarized by the R procedure t.test(x), where the null value $\mu_0=0$ and a two-sided alternative are assumed unless the contrary is specified. Below is output for the sample x above; notice that a 95% confidence interval for $\mu$ is also provided, but that is not part of our current discussion. $H_0$ is not rejected because the P-value exceeds 5%. That is, $\bar X = 0.1784633$ is not significantly different from $\mu_0 = 0.$

t.test(x)

        One Sample t-test

data:  x
t = 0.52028, df = 9, p-value = 0.6154
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.5974901  0.9544167
sample estimates:
mean of x 
0.1784633 

If we just want to see the P-value, we can use $-notation to show just that:

t.test(x)$p.val
[1] 0.6154243

Thus, for one normal sample, we have found that the P-value is $0.6154$ and $H_0: \mu=0$ is not rejected at the 5% level of significance. Of course, different samples from the same normal distribution, will have different values of $\bar X, S, t$ and hence different P-values.

If we want to see the P-value when the null hypothesis is false we can do that too. Notice that now the P-value is smaller than 5%, so we do reject $H_0.$ More on this later.

t.test(x, mu=2)$p.val
[1] 0.0004871346

Considering the P-value as a random variable, we can ask what its distribution is in the circumstances of using $n = 10$ observations from $\mathsf{Norm}(\mu=0,\sigma=1)$ to test $H_0: \mu = 0$ against $H_a:\mu\ne 0$ at the 5% level. Perhaps the key question is "What is the probability that $H_0$ will be rejected?" The answer ought to be $0.05.$ But we can ask about other significance levels as well.

By simulating $m=100,000$ sample of size ten from $\mathsf{Norm}(0,1),$ we can get $m$ P-values, and make a histogram of them to get an idea of the distribution of the P-value of the one-sample t test when $H_0$ is true.

set.seed(2021)
pv = replicate(10^5, t.test(rnorm(10,0,1))$p.val)
mean(pv <= 0.05)  # aprx rejection probability
[1] 0.05093       # aprx significance level 5%

In the histogram below the left-most bar has 5% of the probability, and represents the significance level of the test (the few false rejections when $H_0$ is true).

enter image description here

R code for figure:

hdr = "Uniform Dist'n of P-value when Null Hypothesis True"
hist(pv, prob=T, col="skyblue2", main=hdr)
 curve(dunif(x), add=T, -.1, 1.1, col="red", lwd=2, n=1001)

Another simulation shows the non-uniform distribution of the P-value when $H_0$ is false. A good test will often reject when $H_0$ is false. Accordingly, the distribution of the P-value puts much of its probability on values near $0.$ In the samples below $\mu_0 = 1.5$ so that $H_0: \mu = 0$ is not true. Rejection is likely.

set.seed(1235)
pv = replicate(10^5, t.test(rnorm(10, 1.5, 1))$p.val)  # Ho false
mean(pv <= 0.05)  # aprx rejection probability
[1] 0.98708       # aprx power 99% of test against alternative 2

enter image description here

hdr = "Non-Uniform Dist'n of P-value when Null Hypothesis False"
hist(pv, prob=T, col="skyblue2", main=hdr)

Finally, when the test statistic is discrete or the test is approximate, the distribution of the P-value will not necessarily be unifor--even when $H_0$ is true. However, one can hope that the probability below 0.05 is about 0.05. Here is an example, showing the distribution of the Wilcoxon signed rank test for a small samples from the standard normal distribution.

set.seed(1776)
pv = replicate(10^5, wilcox.test(rnorm(10))$p.val)
mean(pv <= .05)
[1] 0.04813  # signif level near 5%

enter image description here

hdr = "Non-Uniform Dist'n of P-value of Wilcoxon SR Test: Null True"
hist(pv, prob=T, col="skyblue2", main=hdr)
 curve(dunif(x), add=T, -.1, 1.1, col="red", lwd=2, n=1001)