Does comparing two p-values make sense?

Question

Does comparing two p-values make sense?

80 Views Asked by Bumbble Comm At 26 Mar 2026 - 9:14

For example, the p-value of factors willingness to pay and the number of owned cars is 0.3.

The p-value of willingness to pay and the number of owned pets is 0.6.

Can I claim that

the number of owned cars has a stronger relationship with willingness to pay

and the number of owned cars explains willingness to pay

more than the number of owned pets does?

I know that p-value with less than 0.05 is significant but not sure if the p-value is larger then 0.05 we can compare two p-values.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-07-31 19:33:06

Absent requested clarifications, I can only make generic comments on the proper uses of P-values.

If a chi-squared goodness-of-fit test or test for independence has a statistic $Q$ that is approximately distributed as $\mathsf{Chisq}(\text{df} = 5),$ then the critical critical values for tests at the 5% and 1% levels, respectively, are $c = 11.07$ and $c = 15.07.$ You can find these values on row 5 of the table to which you linked; I have found them using R statistical software below:

qchisq(c(.95, .99), 5)
[1] 11.07050 15.08627

So if your computed value of the test statistic is $Q = 12.33,$ you can reject the null hypothesis at the 5% level, but not at the 1% level.

Nowadays, most statistical software gives P-values instead of dealing with specific fixed levels of significance. Software can do that because it can find more detailed information about a particular distribution (for example, $\mathsf{Chisq}(\text{df} = 5)$) than is convenient to print in a published table.

Specifically, the P-value 0.0305 corresponding to $Q = 12.33$ is the area under the density function for $\mathsf{Chisq}(\text{df} = 5)$ to the right of of 12.33. You would reject at the 5% level because $0.0305 < 0.05,$ but not at the 1% level because $0.0305 > 0.01.$

1 - pchisq(12.33, 5)
[1] 0.03053538

Thus given the P-value, a person can choose their own significance level, and make a determination whether the test shows a significant result at that level. So it is fair to say that small P-values are useful to determine the result of a test, and that a tiny P-value such as 0.0003 indicates stronger evidence against $H_0$ than does a larger one such as 0.045--even though both P-values lead to rejection at the 5% level.

However, it is not generally useful to make distinctions between the 'information contained' in larger P-values such as 0.3 and 0.6. That is because, assuming $H_0$ to be true, the P-value is a random variable that is approximately uniform on the interval $(0,1).$ For a continuous test statistic, such as $Z$ in a normal test or $T$ in a t test, one can prove that P-values are precisely $\mathsf{Unif}(0,1).$ For most discrete test statistics P-values are roughly, but not exactly uniform. (One usually explores the distributions of such P-values through simulation.)

The test statistic $Q$ for a chi-squared goodness-of-fit statistic is discrete, because its values are based on integer counts. A simple example is to see what happens in repeated tests whether a die is fair. If a die is rolled $n = 600$ times, then we ought to see each of the six faces "about 100" times. The purpose of the chi-squared statistic is to assess whether the actual face counts are sufficiently close to the expected 100 to say results are consistent with a fair die.

The R code below simulates 100,000 such 600-roll experiments and finds the test statistic $Q = \sum_{i=1}^6 \frac{(X_i-100)^2}{100}$ for each experiment. Then we can make a histogram of the 100,000 values of $Q$ and also a histogram of the corresponding 100,000 P-values.

set.seed(1234)
m = 10^5;  n = 600;  E = n/6; die = 1:6;  q = numeric(m)
for (i in 1:m) {
  faces = sample(die, 600, rep=T)
  X = rle(sort(faces))$lengths
  q[i] = sum((X-E)^2/E)  }

mean(q >= 11.07)
[1] 0.04864

pv = 1 - pchisq(q, 5)
mean(pv <= .05)
[1] 0.04864

Because rolls of fair dice are simulated, it is not surprising to see that $Q > 11.07$ for about 5% of the 600-roll experiments. Equivalently, about 5% of the P-values are below 0.05.

From the histogram we can see that $Q$ has approximately the target chi-squared distribution, rejecting for values to the right of the vertical broken line. Also, the P-values are approximately normally distributed, rejecting for values to the left of the vertical line.

The point of this demonstration is that the uniform distribution of P-values makes it difficult to say that particular P-values such as .3 and .6 are more remarkable or meaningful than others. Ordinarily, we only care about whether P-values are small enough to lead to rejection at our chosen significance level.

Does comparing two p-values make sense?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions