I know, by definition, what p-value and type 1 error mean. However, I have hard time relating those two concepts in rejecting a null hypothesis.
Below are my understanding about P-value and Type 1 Error
1)A p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. (from wiki)
2)A type 1 error incorrectly rejects a true null hypothesis. That is, reject the Null when in fact the Null is true.
My question is - Why do we reject the null hypothesis when p-value < type 1 error? What are some intuitions behind it? What am I missing..? After learning 1 year of statistics, I still have no idea how this works..
Thanks.
Discrete example (one-tailed test): $T \sim \mathsf{Pois}(\lambda).$ Test $H_0: \lambda = 10$ vs. $H_a: \lambda > 10.$
Because $P(T \ge 16\,|\,\lambda=10) = 1 - P(T \le 15) = 0.0487$ a test at significance level $\alpha = 0.0487 = 4.87\%$ rejects $H_0: \lambda = 10$ vs. $H_a: \lambda > 10$ when $T \ge c = 15,$ where $c$ is called the critical value of the test. Computation in R, where
ppoisis a Poisson CDF.Thus, if you observe $T = 20 > c,$ then you will Reject $H_0$ at level $\alpha.$
However, if you observe $T = 20,$ then the P-value is the probability $P(X \ge 20 \,|\,\lambda = 10)$ $=1-P(X \le 19\,|\,\lambda = 10) = 0.0035,$ and you can say your reject at level $\alpha$ because the P-value is less than $\alpha.$
The sum of the heights of the bars in the plot to to right of the vertical dotted line at the critical value is the significance level $\alpha.$ The P-value is the sum of the heights of the bars to the right of the observed value $T = 20$ (solid line).
Continuous example (two-tailed test): Suppose we have a sample
xof $n = 20$ observations from a normal population with unknown mean and variance. We wish to test $H_0: \mu = 100$ against $H_a: \mu \ne 100,$ at the 5% level.The data are summarized in R as follows:
In R, a
t.testin this situation is as shown below. The sample mean $\bar X = 107.15$ is greater than the hypothetical mean $\mu = 100.$ The question is whether this is sufficiently different to warrant rejecting $H_0.$ According to the t test, the P-value is $0.033 < 0.05 - 5\%,$ so we reject $H_0$ at the 5% level.Computer output does not always give the critical value. In this particular example the test statistic is distributed as Student's t distribution with $\nu = 19$ degrees of freedom. The critical values for this two-tailed test would be $\pm 2.093,$ where $2.093$ cuts area $0.025$ from the upper tail of $\mathsf{T}(\nu=10),$ as computed in R below (or obtainable from printed tables of t distributions). Knowing the 5% critical values, we can see that $H_0$ is rejected at the 5% level because observed $T = 2.2931$ does not lie between $\pm 2.093.$
Computer out usually shows a P-value (which can be used to decide whether to reject at any desired level of significance). In the example above, the observed value of the t statistic is $T = 2.2931$ (which you can check by hand from the summary statistics above). The P-value is the probability of an outcome as or more extreme (in either direction from 0). It is computed $0.0341$ as shown in R below.
Knowing that the P-value is smaller than 5%, we can say that $H_0$ is rejected at the 5% level.
[Typically, one can roughly approximate the P-value from printed tables, but exact P-values are ordinarily computed by software. The very small difference below from the P-value in the output is due to rounding error; the output rounds the observed value of $T$ to four places.]
In the figure below, the significance level $\alpha = 0.05$ is the sum of the two tail areas outside the vertical dotted lines. The P-value is the area to the right of the solid black vertical line, plus the the area to the left of the dashed line on the left (just as far from 0 on the other side).
Note: The data used to make the sample for the second example is shown below. Even though data were sampled from $\mathsf{Norm}(101, 15),$ the sample mean turned out to be $\bar X = 107.2,$ which is not surprising given the small sample size.