A question about the definition of p-values

Question

A question about the definition of p-values

45 Views Asked by Bumbble Comm At 09 Apr 2026 - 3:23

In hypothesis testing, the definition of p value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

My question is why the "at least as extreme" part? Why is it not enough to consider only the probability of obtaining the test result?

For example:

A hypothesis test on the fairness of a coin.

H₀: P(Heads) = 0.5

H_A: P(Heads) > 0.5

We carry out a test on the coin with the result being 8 heads out of 10 coin flips.

The p-value is P(8 Heads|H₀ is True) + P(9 Heads|H₀ is True) + P(10 Heads|H₀ is True).

My question is why the p-value not just P(8 Heads|H₀ is True)? Why care about the probability of 9 Heads and 10 Heads when the test only gave 8 Heads?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2023-10-15 15:58:20

This should be explained in lots of different articles. Lets recap some idea.

The first thing is to understand the word "extreme" here. In hypothesis testing, we define a certain critical region for test statistics, such that when the test statistics falls into that critical region, we will reject the null hypothesis.

That means there is a certain "direction" for the test statistic - when the test statistic is closer to a certain region, a certain extreme, it will favor the alternative hypothesis - providing evidence supporting the alternative.

In your example, using likelihood ratio test / it is natural to see that a larger test statistic will favor the alternative. With a fixed sample size, defining the actual boundary / cutoff of the critical region is always a trade-off between the Type-I and Type-II error. Usually we will use the significance level of the test to control the Type-I error.

Back to your example, if you observe an $8$ and decided to reject the null, then you will also reject the null if you observe more extreme test statistics - larger value like $9$ or $10$, as they are even more favorable to the alternative. The duality plays the role here - imagine the critical region is defined as $\{8, 9, 10\}$ and you can compute the Type-I error of this test - which will gives you the p-value.

Instead of fixing the critical region in advance, one can specify the significance level of the test and compared with the p-value. So in either way we are controlling the type-I error.

The bottom line: The critical region of this test will not be $\{8\}$ only - you should not reject the test only when the test statistic is $\{8\}$ but do not reject when it is in $\{9, 10\}$. Such test is always sub-optimal than $\{8, 9, 10\}$

A question about the definition of p-values

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in HYPOTHESIS-TESTING

Related Questions in P-VALUE

Trending Questions

Popular # Hahtags

Popular Questions