What does a uniform p-value distribution mean? It seems so many sources say different things.
I come from a physics background, so please explain using the most basic of statistics language what the implications are for my data if this is the p-value distribution.
I was reading from two websites (noted below) that seem to state two different things for a uniform p-value distribution. Is a uniform p-value distribution good?
http://varianceexplained.org/statistics/interpreting-pvalue-histogram/
If the test statistic is continuous, the test is exact, and the null hypothesis is true, then the P-value is uniformly distributed on the unit interval.
I will illustrate this, beginning with a sample of size $n=10$ from $\mathsf{Norm}(\mu,\sigma),$ where both parameters are unknown. We test $H_0: \mu = 0$ against $H_a: \mu \ne 0$ at level $\alpha = 0.05 = 5\%,$ using a one-sample t test. The test statistic is $T=\frac{\bar X - \mu_0}{S/\sqrt{n}}.$
Assuming $H_0$ to be true, $T \sim \mathsf{T}(\nu = n-1),$ Student's t distribution with $n-1$ degrees of freedom, where $\bar X$ is the sample mean and $S$ is the sample standard deviation. If the observed value of $T$ is $t,$ then the P-value is $P(|T| \ge t\,|\, H_0).$ The null hypothesis is rejected if the P-value is smaller that $0.05 = 5\%.$
For example, let specific data be as sampled in R below.
These computations are performed and summarized by the R procedure
t.test(x), where the null value $\mu_0=0$ and a two-sided alternative are assumed unless the contrary is specified. Below is output for the samplexabove; notice that a 95% confidence interval for $\mu$ is also provided, but that is not part of our current discussion. $H_0$ is not rejected because the P-value exceeds 5%. That is, $\bar X = 0.1784633$ is not significantly different from $\mu_0 = 0.$If we just want to see the P-value, we can use
$-notation to show just that:Thus, for one normal sample, we have found that the P-value is $0.6154$ and $H_0: \mu=0$ is not rejected at the 5% level of significance. Of course, different samples from the same normal distribution, will have different values of $\bar X, S, t$ and hence different P-values.
If we want to see the P-value when the null hypothesis is false we can do that too. Notice that now the P-value is smaller than 5%, so we do reject $H_0.$ More on this later.
Considering the P-value as a random variable, we can ask what its distribution is in the circumstances of using $n = 10$ observations from $\mathsf{Norm}(\mu=0,\sigma=1)$ to test $H_0: \mu = 0$ against $H_a:\mu\ne 0$ at the 5% level. Perhaps the key question is "What is the probability that $H_0$ will be rejected?" The answer ought to be $0.05.$ But we can ask about other significance levels as well.
By simulating $m=100,000$ sample of size ten from $\mathsf{Norm}(0,1),$ we can get $m$ P-values, and make a histogram of them to get an idea of the distribution of the P-value of the one-sample t test when $H_0$ is true.
In the histogram below the left-most bar has 5% of the probability, and represents the significance level of the test (the few false rejections when $H_0$ is true).
R code for figure:
Another simulation shows the non-uniform distribution of the P-value when $H_0$ is false. A good test will often reject when $H_0$ is false. Accordingly, the distribution of the P-value puts much of its probability on values near $0.$ In the samples below $\mu_0 = 1.5$ so that $H_0: \mu = 0$ is not true. Rejection is likely.
Finally, when the test statistic is discrete or the test is approximate, the distribution of the P-value will not necessarily be unifor--even when $H_0$ is true. However, one can hope that the probability below 0.05 is about 0.05. Here is an example, showing the distribution of the Wilcoxon signed rank test for a small samples from the standard normal distribution.