What is the correct statistical language to conclude using type II error?

229 Views Asked by At

Update:

This question arises when I read a neuroscience paper. In the natural science community, people are generally less careful about the correctness of statistical language. I am clearly aware of the definitions of Type I error and Type II error. In statistics classes, we were taught that we should say either "reject the null" or "fail to reject the null" at a given significance level regarding Type I error rate. But we were rarely taught about the language used to describe Type II error.

The author hypothesized that two samples are from the same distribution. The statistical test was performed using the following hypotheses.

The null hypothesis: the two samples are from the same distribution.

The alternative hypothesis: the two samples are from different distributions.

The author got a p-value=0.83. Here the p-value is the same as the type I error rate. At significance level=0.05, we fail to reject the null hypothesis. That is, there is 83% chance that they are from different distributions when we say they are not.

When the neuroscientists see something like this, they are more or less convinced that the two samples are from the same distribution. Of course, as a mathematician, this is far from rigorous.

I am wondering if calculating the Type II error will strengthen the argument. If my type II error rate is very small, say 0.03. What would be the correct statistical language to draw conclusion?

Basically, I want to know the implication of the null/alternative given a high type I error and a low type II error and the rigorous statistical language to describe it.

3

There are 3 best solutions below

2
On BEST ANSWER

We could obtain a high p-value of $0.83$ if $F=G$ or close to equal or if the sample size was small for this type of test. By determining the $\beta$ or the power $1-\beta$ we can gain some information. Unfortunately that is not always easy.

The Type II error is, in general, a function not just a value. As drphil pointed out, you choose a Type I value (along with the specific hypothesis test and sample size) and then Type II value(s) are determined. We can use that information (if easily computable) to choose between test statistics. e.g., chi-square or Kolmogorov-Smirnov test. That is the quest for uniformly most powerful tests. Or we can use the information to determine a sample size to use to give chosen $\alpha$ and $\beta$ values. Here you would have to specify a particular alternative hypothesis for the $\beta$ value.

In the example you cite, if the alternative is true then $F\ne G$ but that is not enough to compute $\beta.$ In this problem, suppose we want to detect a difference between F=exponential mean 1 and G=exponential with mean 5. With these $F,G$ as the population cdfs then we need to find the probability of rejecting $F=G$ using our particular test. That is the power $1-\beta.$ If that is an important difference to detect if it is true in this example then we want $1-\beta = 0.95$ or higher. Select the sample size to do that. (Might have to be done by simulation.) This will then fix the $\beta$ values for all other types of alternative hypotheses.

So in summary the power function of the hypothesis test is what you are looking for. There is no further "reject\do not reject" decision.

P.S. The interpretation of $0.83$ is not correct:

"83% chance that they are from the same distribution when we say they are not." That is (almost) the statement of a Type II error probability. The correct interpretation is: if we had used $\alpha \gt 0.83$ then we would have Rejected the null hypothesis. Or it is the probability of getting a sample result more extreme (more favorable to the alternative) than the result we actually obtained.

7
On

I do not think that a null hypothesis should ever be accepted. Either the null hypothesis is rejected or you fail to reject it. Which error you want to minimize determines how readily you are going to reject the null hypothesis.

Essentially, we are assuming the null hypothesis and seeing if there is enough information to reject it. There's a fairly detailed discussion on Minitab's website.

Wikipedia actually gives a great example: $h_0$ is innocence. In this country, we work on "innocent until proven guilty" so you want to minimize rejection of $h_0$.

3
On

You definitely do not want to accept a null hypothesis - i said that at a conference once and a bunch of statisticians slashed my tires.

Consider:

$H_o$: $\mu_1 - \mu_2 = 0$

$H_a$: $\mu_1 - \mu_2 \ne 0$

A Type I Error is to Reject $H_o$ but $H_o$ is True; a Type II Error is to Not Reject $H_o$ but $H_o$ is Not True. A Probability value, the weight of evidence against the null hypothesis (the smaller the probability value, the more evidence against the null hypothesis), is also called the observed significance level $\alpha$, the probability of a Type I error. We reject the null hypothesis if the probability value is less than or equal to the desired significance level; a probability value = 0.83 is Not less than or equal to a Type I Error rate $\alpha = 0.05$, hence our conclusion would be to fail to reject the null hypothesis.

The probability of a Type II Error is a function of the values under the alternative hypothesis $H_a$. Generally, we evaluate the costs of a Type I and a Type II Error. Next, if the cost of a Type I error is relatively more costly than a Type II Error, then we choose a lower $\alpha$ Type I Error rate (say, $\alpha = 0.01$), which results in a higher chance of making a Type II Error. On the other hand, if the cost of a Type I error is relatively less costly than a Type II Error, then we choose a Higher $\alpha$ Type I Error rate (say, $\alpha = 0.10$), which results in a lower chance of making a Type II Error.