Derivation of hypothesis testing -- intuition?

268 Views Asked by At

Let's say I have data points $0.1$, $0.2$, $0.3$ coming from a normal distribution with mean $\mu$ and standard deviation $1$.

If I want to test the hypotheses $H_0: \mu = 0.15$ vs. $H_1: \mu > 0.15$, then the test statistic is $$T = \frac{\hat{\mu} - \mu}{1/\sqrt{n}} $$
where $\hat{\mu}$ is the sample mean. Under the null hypothesis,
$$T = \frac{\hat{\mu} - 1.5}{1/\sqrt{3}} \sim \mathcal{N}(0,1). $$

With my data, I know my observed test statistic is $T = \sqrt{3}\times 0.05$.

Now my confusion here is the rationale for why we do the following:
The p-value is $p = \mathbb{P}(Z > \sqrt{3}\times 0.05)$ where $Z$ is standard normal. Why do we look at the probability that the theoretical statistic exceeds the observed test statistic? Why not when it doesn't exceed?

A similar question with the p-value is when we do the two-sided test ($H_1: \mu \neq 0.15$). The p-value will evaluate to
$$p = 2\mathbb{P}(Z > \sqrt{3}\times 0.05)$$
but if the two-sided alternative is more likely than the one-sided alternative (It's either $>$ or $<$ opposed to $\neq$), shouldn't we intuitively be more likely to reject in the two-sided test? But our p-value seems to give the opposite result with the factor of 2, as we reject if p is small and this factor of 2 makes it harder to reject. Why is there a discrepancy in my intuition?

1

There are 1 best solutions below

6
On

as we reject if p is small

This is probably the root of your confusion. The p-value doesn't determine whether we accept or reject the null hypothesis; the p-value is a descriptor of the test itself. The p-value tells us "how good" the test is (in a certain sense of "how good"), even before the we perform the test; the result of the test will determine whether we accept or reject, but the p-value gives us information about how much we should "trust" that acceptance or rejection.

The p-value is the probability of a false negative, i.e., the probability that we will reject a true null hypothesis because the test should an abnormal result. In a two-sided test, there are twice as many ways that a true null hypothesis could show an abnormal result; as you correctly stated, this makes rejection much more likely, and since the p-value is the probability of rejection, this makes the p-value higher. Since the p-value is higher, we will less likely to "trust" the rejection -- it could more easily have been a statistical anomaly under the null hypothesis.

As for your first question, your calculations are conceptually backward. Conceptually, we first determine the p-value, and then we run the test and observe the statistic. It's true that we can do it the other way around, first observing the statistic and then determining what the lowest p-value is such that the statistic still tells us to reject, but this is conceptually much more awkward (and that's probably why you're having trouble with the intuition). In any case, the reason p-value calculation requires us to look at the probability the test statistic will exceed a certain threshold is that we are calculating the probability of rejection given a true null hypothesis, and rejection occurs upon exceeding that threshold. In the conceptually awkward post-observation calculation of lowest possible p-value, we just set the threshold to be exactly the observed value (since that is the highest possible threshold we could set at which the observation would still tell us to reject) and go through with that calculation.