Why do I have to use a fear as the Alternative-Hypothesis?

136 Views Asked by At

Introduction

I'd like to understand, why I have to use a fear as the Alternative-Hypothesis and not as the Null-Hypothesis. I'm using my homework as an example for that, which says the following (I translated it manually, I hope that you can understand the exercise):

A big band finds out that one CD from their first 20 CDs has a wrong cover on
it. Now they're afraid that at least 5% of all covers are wrong. So they
want to start a significance test with a significance niveau of 10% and with
the Null-Hypothesis: "The share of the wrong covers is smaller than 5%". If the
result shows that their fear is true, they will ask the producer to sell the
CDs for a lower price.

a) Explain why they chose this Null-Hypothesis.

The solution says, that they chose this Null-Hypothesis in order to be able to limit the risk of the error, but I'm wondering why it isn't possible to do that with the Hypothesis: "The share of the wrong covers is at least 5%.". Because this would let to these two Hypothesis:

$$ H_{0}: p \geq 0{,}05 \\ H_{1}: p < 0{,}05 $$

But the correct way would be:

$$ H_{0}: p < 0{,}05 \\ H_{1}: p \geq 0{,}05 $$

Now let's stick to the "correct way" for a moment and assume that I'd do a Hypothesis test, I'd look at the "corner-case" of the Null-Hypothesis, so $H_{0} = p < 0{,}05 = p = 0{,}05$. After that I can calculate the "area", which helps me to decide when I can reject the Null-Hypothesis and when not, by finding out how many CDs have a wrong cover:

$$ P\left(X \geq k | p = 0{,}05\right) \leq 0{,}1 $$

So the region of rejection would be:

$$ R = \left\{k, \ldots, n\right\} $$

where:

  • $n$ is the amount of the CDs which are used in the Hypothesis test and
  • $k$ is the least border-value where we reject our Null-Hypothesis or in other words: The minimal amount of CDs which have a wrong cover, where we reject the Null-Hypothesis.
  • $X$ represents the amount of CDs which have a wrong cover

So far so good, that makes sense for me: We're calculating the rejecting-areas to be able to conclude if the result of our test would be the same as if we'd test all CDs which they sold.

EDIT (1):

Little "addition" to the conclusion here: We've calculated now our rejecting-areas to be able to decide if our fear (the Alternative-Hypothesis) is "true" (or "valid") or not by testing it on our dataset of $n$ CDs. If it turns out that our fear is "true", so our dataset landed in our rejection-area, we can think, that our fear (the Alternative-Hypothesis) has a probability to be the "better" assumption than the Null-Hypothesis. But it could change, if we'd do more tests and the "hit-rate" of the Null-Hypothesis increases and becomes bigger than the Alternative-Hypothesis so that's why we can't say, that this hypothesis is "the absolute correct one". That's correct, right?

Question

Now I'm wondering: Why I can't just use their fear as the Null-Hypothesis (my first two Hypothesis)? I can do the same steps as before like in the "correct way" and get a rejecting-area for my Null-Hypothesis as well. This would lead to the following Hypothesis test:

$$ P\left(X \leq k | p = 0{,}05 \right) \leq 0{,}1 $$

After finding a suitable value for $k$, I'd have this rejection-area:

$$ R = \left\{0, \ldots, k\right\} $$

Which can be intrepreted like this(?): "If there are at least $k$ CDs with a wrong cover of $n$ CDs, than I can be sure that there are only at the most 5% of all CDs with a wrong cover." What's my wrong thought?

Which external sources have you tried out?

I watched StatQuest's video about the alternative Hypothesis and in this part, he explained that you can't assume that our alternative Hypothesis is right since it includes "all" alternative Hypothesis (if we're using the correct way). I understood his explanation about his example, but I can't get a "connection" from his example to my problem/homework. Didn't I showed, that $H_{0}: p \geq 0{,}05$ is true if I found more than $k$ wrong CDs according to my (wrong) Hypothesis test? Didn't I show in this Hypothesis test, that no matter which other Alternative Hypothesises are included in the Alternative Hypothesis, they all have $p \geq 0{,}05$?

Summary

The correct one

$$ H_{0}: p < 0{,}05\\ H_{1}: p \geq 0{,}05 $$

Our rejection-area:

$$ R = \left\{k, \ldots, n\right\} $$

Interpretation of our result:

  1. We get into our rejection-area $\to$ The possibility of our null hypothesis is smaller than the alternative hypothesis.

    In other words:
    The possibility that there are at most 5% of the CDs with a wrong cover in a dataset is smaller than the possibility that at least 5% of the CDs have a wrong cover.

  2. We don't get into our rejection-are $\to$ The possibility of our null hypothesis is greater than the alternative hypothesis.

    In other words:
    The possibility that there are at most 5% of the CDs with a wrong cover in a dataset is greater than the possibility that at least 5% of the CDs have a wrong cover.

The "false" one

$$ H_{0}: p \geq 0{,}05 \\ H_{1}: p < 0{,}05 $$

Our rejection area:

$$ R = \left\{0, \ldots, k\right\} $$

Interpretation of our result:

  1. We get into our rejection-area $\to$ The possibility of our null hypothesis is smaller than the alternative hypothesis.

    In other words:
    The possibility that there are at most 5% of the CDs with a wrong cover in a dataset is greater than the possibility that at least 5% of the CDs have a wrong cover.

  2. We don't get into our rejection-are $\to$ The possibility of our null hypothesis is greater than the alternative hypothesis.

    In other words:
    The possibility that there are at most 5% of the CDs with a wrong cover in a dataset is smaller than the possibility that at least 5% of the CDs have a wrong cover.

I achieved in both cases the same result, correct? So why is it not possible to chose the "false" one?

1

There are 1 best solutions below

0
On BEST ANSWER

Ok, so me and my friends "found" the reason now.

Let's pick up the "wrong" Hypothesis first:

$$ H_{0}: p \geq 0{,}05 \\ H_{1}: p < 0{,}05 \\ A = \left\{0, \ldots, k\right\} $$

So let's go through both cases:

  • Assume we land into our rejecting-area $A$. This would mean, that we'll reject $H_{0}$ because its possibility is too small for our current dataset. So we assume that the possibility, that there are at most $5\%$ wrong CDs, is higher than the possibility that there are at least $5\%$ wrong CDs. As a result, we will do "nothing" because we think that there are at least $5\%$ of all CDs with a wrong cover.

  • Assume we don't land into our rejecting-area $A$. This would mean, that the possiblity of $H_{0}$ is too big that we can't reject this hypothesis. Of course you could say, that we have to go to the producer now and demand him to reduce the price of each CD since we showed that it's possible that at least $5\%$ of all CDs have a wrong cover.

    BUT:
    We calculated the rejection-area for $p = 0{,}05$ as a limit-possibility! So what happens, if $p$ is actually $0{,}9$? This would mean that our rejection-area would be actually larger as we've set it now! So we can't demand the producer to reduce the price of the CDs, even if we found $k + 1$ CDs ($k$ is our calculated value for $p = 0{,}05$) with a wrong cover because $p$ might be actually greater so we wouldn't be actually in the rejection-area if $p = 0{,}99$ for example.

So we didn't achieve anything with these two results if we'd choose the Hypothesis like that.

Now let's pick up the "correct" Hypothesis:

$$ H_{0}: p < 0{,}05 \\ H_{1}: p \geq 0{,}05 \\ A = \left\{k, \ldots, 150\right\} $$

This would lead to the following cases:

  • Assume we land into our rejection-area $A$. This would mean that our Hypothesis $H_{0}$ has a too small probability for our current dataset and we can think that our Hypothesis $H_{1}$ is "true" so we need to demand the producer to reduce the price of the CDs. You could compare this case with the second assuming of the wrong Hypothesis and say: "But we don't know what $p$ is." But in this case, it doesn't matter what $p$ is since we assume that $p$ is smaller than $0{,}05$! If $p$ would be $0{,}02$ or $0{,}04$ our rejection-area would just go further to the left (so k becomes smaller) and if our current test included $k$ wrong CDs, we would still be in the rejection-area if $p$ is actually smaller than $0{,}05$! That's why we have to chose our fear as the alternative Hypothesis :)

  • Assume we don't land into our rejection-area $A$. This would mean that we can't reject our $H_{0}$ Hypothesis because its probability is not small enough to be seen as "not possible" for our current dataset. As a result: We don't know what we should do, since both Hypothesises are possible.

I hope that this explanation is clear enough (and not wrong).