$p -$value and rejecting the null hypothesis

Question

$p -$value and rejecting the null hypothesis

311 Views Asked by Bumbble Comm At 10 May 2026 - 9:09

I have been studying the basics of the Hypothesis Test in my statistics class, more specifically, studying the Analysis of Variance F-test. My question has to do with the $p$-value. Thus, here is my question:

If the p-value of the test comes out to be a small number, this fact is taken as justification for "rejecting the null hypothesis." Why is this a reasonable conclusion?

I am not sure what my professor meant by a "small number", but after doing some research, it turns out that if our p-value $\leq .05$, this suggests that we can reject our null hypothesis. He made a remark that if I said that, for example, that "this is the rule that statisticians use", it would not be proper justification. So, if I do end up yielding a "small number" for my p-value, why can I reject the null hypothesis?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 25 Apr 2018 - 1:03

If there is one thing a p-value most assuredly is NOT, it is the probability that the null hypothesis is true. Rather, the p-value is the probability, IF the null hypothesis is true, of observing a more extreme value than the observed value of a test statistic. So small p-values suggest that the data may not be consistent with the null hypothesis, the data may be even less consistent with other reasonable hypotheses. Also, one often tests many hypotheses (e.g., hundreds of possible regressors). In those sorts of situations, if one accepted one in twenty of the variables that didn't matter they might swamp the ones that did matter. (This is the multiple testing problem.) For this reason, thresholds much smaller than 0.05 are often appropriate before one rejects the null.

If you want to interpret p-values as probabilities of the truth of the null hypothesis, I suggest multiplying them by ten first. That will give you a very rough guide. For more details, see Sellke, Bayarri, and Berger "Calibration of p-values for testing Precise Null Hypotheses", Am. Statistician, 2001.

Bumbble Comm On 25 Apr 2018 - 1:31

The .05 significant level has nothing magical to it and is taken by tradition. I suppose you know that the Null Hypothesis states that there is no difference between samples. We start out by assuming it is true. Let's take an example without a Significance Level at first to illustrate so that you can understand it intuitively.

Let's say I have a friend who claims he has a super power and he can predict whether a coin shows heads or tail in my hand even before peaking at which side it shows. Let's say he specifically claims he can We decide to put his theory to the test.

A coin toss is a Binomial Event: after each coin toss he can either succeed or fail. We agree to toss the coin $n = 100$ times. The probability of him getting it right (or wrong) at each toss is 0.5, and since the Expectation for the Binomial Distribution is computed as $E[X] = n.p$ then the expectation in this case is 100 x 0.5 = 50. This means that we expect a person that does NOT have any super power to get 50 "predictions" right just out of pure luck. This means, for us to be ready to admit he may indeed have some super powers, he needs to beat this number (i.e. do much better than what the average person with no powers can do out of simple luck). And you can see that the more he exceeds that Expectancy, the more we can concede that he did something very unlikely that cannot be explained solely by luck, and that he may indeed have some powers, thus we can be satisfied enough to reject the Null Hypothesis and assert that yes, there IS a significant difference between his performance and the average performance.

Let's say this friend of ours managed to correctly predict 54 coins. He's so happy about his performance and he tells us that that's more enough evidence for his powers. But you can see is not that far from 50, which is what we expect a normal person with no powers to achieve. If you compute the probability of getting 54 correct calls (or more) with the given parameters, you'd find that by pure chance, a person has almost 0.2 probability of achieving this performance (of 54 calls). That's a high number. Meaning a person with no powers can achieve this same performance by pure chance. And so him managing to beat the Expectancy can be entirely explained by chance, and so we don't have enough evidence against the Null Hypothesis: we reject the Alternative Hypothesis.

Let's take another case wwhere he managed to predict 68 coin tosses. Computing this using the Binomial Distribution we get around 0.0001 probability: this is a VERY extreme event. An extremely rare occurence that is very unlikely to happen, and yet it did. The way we view this is: assuming the Null Hypothesis is true (assuming the friend has no powers as a start) the probability of an average person with no special powers to achieve this is 0.0001. And seeing that an average person is extremely unlikely to achieve such performance and thus it cannot be entirely attributed to little variations around the expectation due to pure luck, then we are ready to concede that there is enough evidence against the Null Hypothesis, and so we reject it.

This is the intuition behind the p-value. You asked " If the p-value of the test comes out to be a small number, this fact is taken as justification for "rejecting the null hypothesis." Why is this a reasonable conclusion? ".

Well in our coin toss example, the p-value = 0.0001. This is a very extreme probability, and so you can see why this probability being small would give us evidence against the Null Hypothesis. This p-value being smaller and smaller means our friend performs better and better: he achieves more and more extreme and very unlikely events that cannot be attributed solely to random variations due to luck alone. The smaller the p-value the more our friend's performance moves to the extreme right of the distribution. And since by definition the Null Hypothesis is the assumption that there wouldn't be any difference between his performance and the average expected performance, then him moving more and more away from the mean gives stronger and stronger evidence against the Null Hypothesis, and that's why small p-values make us reject the Null Hypothesis.

Finally, about the Significance Level, as I said that's just by tradition. There's much debate about what's an appropriate level. In our example, 0.0001 is a lot smaller than 0.05 and thus if we were using that level our Null Hypothesis would've been rejected.

**Bumbble Comm** · Accepted Answer

It might be helpful to do this exercise on a more concrete example than then F-test... so let's consider the classic example of coin flips.

Here our null hypothesis will be that the coin is fair. Let us say that we flipped the coin $100$ times and got $99$ heads. Pretty much anyone would be comfortable rejecting the null hypothesis under this circumstance. Why? Because the probability of a fair coin being flipped $100$ times and coming up heads $99$ of those times is tiny!

We were able to decide this on the basis of our intuition. But what if it were $75$ heads instead of $99$? How about $60$? We need a way to quantify. We use the idea in the last sentence of my previous paragraph. We said the probability of the experiment coming out that way was tiny, but how tiny was it? This probability is called the p-value.

Stated semi-formally, the p-value is the probability under the null hypothesis that the experiment will out as extreme or more extreme than the data you have in front of you. We can calculate exactly the probability that a fair coin will come out with either $0$, $1$, $99$ or $100$ heads (the $0,1,100$ are included cause they are as extreme, or more extreme than $99$ heads). It is $$ p = 2\frac{1}{2^{100}} + 2\cdot 100 \frac{1}{2^{100}} = 1.6\times 10^{-28}.$$ This is the p value. Notice it is a tiny number, indicating that the results we say are extremely unlikely under the null hypothesis. This gives us good confidence in rejecting it.

Now we can compare to the case where we see $75$ heads or $60$ heads out of $100$. For $75$ heads, we can compute a p-value of $5.6\times 10^{-7}$ and for $60$ heads we get $0.057.$ Now we have a little more perspective on these less obvious cases. It turns out getting $75$ heads/tails or more is a one in a million occurrence if the coin is fair. So if it wasn't obvious before we can feel more confident in rejecting the null hypothesis that the coin is fair.

And we see the probability of getting $60$ or more is around $6\%.$ This is unlikely, but not that unlikely. So we might be more cautious about rejecting the null hypothesis... it's possible the coin is biased, but it's also plausible that it was just a statistical fluke that there were $60$ heads.

Where to draw the line is subjective. A historical convention, as you mentioned, is to use a threshold of $5\%$ (whereby we would elect to retain the null hypothesis for the case of $60$ coin flips). So we pick a line (before the experiment is done) based on how often we're comfortable being wrong, and then reject if the p value from the experimental outcome is lower.

This is the sensible (though not unflawed) logic of hypothesis testing. And it explains why low p-values mean we should reject.

$p -$value and rejecting the null hypothesis

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in HYPOTHESIS-TESTING

Trending Questions

Popular # Hahtags

Popular Questions