selecting Rejection Regions with two-sided alternative hypothesis

73 Views Asked by At

Let $x$ be an observation of $X$~$Bin(n,p)$. We want to test null hypothesis $H_0: p=p_0$. Give the appropriate type of rejection regions where

a) $H_1: p<p_0$ (Could need some comment of this solution. Have I done right?)

b) $H_1: p>p_0$ (Could need some comment of this solution. Have I done right?)

c) $H_1: p\ne p_0$ (I need help with this one)

I know that the values which is in this region are the values of the testvariables which indicates that the alternative hyptothesis $H_1$ is true. Furthemore i know that the testvariable is often based on a estimation of the parameter, in our case the parameter $p$. So, to decide the estimation, we can use the "method of moments", ML-estimation(since the whole distribution is known) or MK-estimation. I will use the "method of moments" which tells us that the estimation of parameter $p$ is the solution to the equation $$E(X)=m(p)$$ $$m(p)=\frac{1}{n}\sum_1^nx_i$$ there $m$ is a known function of the unknown parameter $p$. $$E(X)=np=m(p)$$ $$np=\frac{1}{n}\sum_1^nx_i=x$$ The RH in the last equation is equal to the observation $x$ since we have a sample of size n=1. Therefore the estimation of p is $$p^x=\frac{x}{n}......(1)$$ By $E(X)$ we get that $E(p^*)=\frac{E(X)}n=p$. The estimation of $p$ is an observation of the estimate $p^*(X)$~$Bin(1,p)..... (2)$

Solution: The rejection region is determined by the choice of the alternative hypothesis $H_1$.

"(" and ")" are representing "{" and "}" respectively in the set $C$ .

a) $C=(T\le K)=(p^*\le K)=(x\le K)$ there $n=1$ in $(1)$ because of $(2)$.

b)$C=(T\le K)=(p^*\ge K)=(x\ge K)$ there $n=1$ in $(1)$ because of $(2)$.

c) I get stuck here. I know that the alternative hypothesis is twosided and that the rejection region should "contain" both large and small values of the testvariable. But I can't figure out the region. The book says that $$C=(x:|x-np_0|\ge K)$$ Why $np_0$ and not just $p_0$?

2

There are 2 best solutions below

0
On

It seems to me you are saying that the number of successes in $n$ trials is $X \sim Bin(n, p)$, which is true; and that the sample proportion of successes $\hat p = X/n \sim Bin(n, 1)$, which is not. I have no idea what $T$ is. And $\{\hat p \le K\}$ is not the same event as $\{X \le K\}$. This question has been sitting here unanswered for several days, so let's try for a fresh start.

For a test of hypothesis (a) that the true $p$ is the same as hypothetical $p_0$ against the alternative that $p$ is smaller than $p_0$, I think it is best to stick with the observed number $x$ of successes to express the rejection region. The hypothetical expected number of successes is $np_0$. So you would reject this hypothesis if the observed number $x$ of successes is much smaller than expected. This can be written as $x - np_0 \le K$.

Example: Suppose it is claimed a coin is fair (that is, $P(Heads) = p_0 = 1/2$), that you intend to bet on Heads, and you fear that the true $p$ is smaller than 1/2. So you toss the coin $n=100$ times, and observe $x = 39$ Heads. The expected number of heads is $np_0 = 100/2 = 50$. Of course you wouldn't insist on getting exactly 50 heads, but 39 does seem a lot less than expected.

If the coin is fair, then the number of heads is $X \sim Bin(100, 1/2)$. In that case $P\{X \le 39\} = 0.0176$. (I got this using the statement 'pbinom(39, 100, .5)' in R software.) So it would be very rare to get such a small number of heads using a fair coin and you decide the coin is unfair.

Where do you draw the line? How small does the observed number of heads have to be in order for you to reject the null hypothesis? A commonly used criterion is to judge such a 'tail' event as suspiciously rare if its probability is less than 5%. For a random variable $X \sim Bin(100, 1/2)$, one can find that $P\{X \leq 41\} = 0.0443$ (a 'rare' event) and $P\{X \leq 42\} = 0.0666$ (not quite 'rare' by the 5% criterion). This means that you will reject for any $x \leq 41$.

Finally, suppose you want to focus on how far below the expected number of heads you are willing to go before you reject. Then you would let $K = 41 - 50 = -9$ and say that your rejection region is $x - np_0 \leq K$, where $K = -9$. (Reject above because $x - np_0 = 39 - 50 = -11 \leq -9$.)

Now, I suppose you can figure out how to write the other two rejection regions.

0
On

This answer discusses the commonly used normal approximation to the binomial test of the original question.

When $np_0$ is moderately large (say greater than 5 or 10 depending on your fussiness for accuracy), it is common practice to use the normal approximation to the binomial distribution specified under the null hypothesis. To test the left-sided alternative (a), the approximate normal test goes as follows.

If $X \sim Bin(n, p_0)$ then $Z = (X - np_0)/\sqrt{np(1-p_0)}$ is approximately standard normal. In the example of my first answer, with $p_0 = 1/2$, $n = 100$, and observed number $x = 39$ of Heads, we have observed $z = (39 - 50)/5 = -2.2$.

Then then the probability (under the null hypothesis) of a number of Heads as far or farther below the expected 50 to be found as $\Phi (-2.2) = 0.0139$, where $\Phi$ signifies the CDF of standard normal, widely tabled. For practical purposes, this is not far from the exact binomial value 0.0176, obtained in my previous answer. [A continuity correction would use (39.5 - 50)/5 = -2.1 and $\Phi(-2.1) = 0.0179$, but few elementary texts insist on doing this.] The rejection region for the approximate normal test is $z < -1.645,$ because $\Phi(-1.645) = 0.05.$

Traditionally, this approximate method has been very widely used when $np_0 > 5$ because it is reasonably accurate and can be implemented using widely-available tables of the standard normal PDF. By contrast, the exact method illustrated in my first answer often requires the use of software.

Notes: (1) For testing against a right-sided alternative, the rejection region of the approximate normal test is $z > 1.645$; for the two-sided alternative, the rejection region in $|z| > 1.96$. All three rejection regions are for tests at the 5% level of significance. (2) For comparison with rejection regions of the exact binomial test in my first answer, notice that the difference $x - np_0$ is used in the numerator of $z$. (3) If $p_0 \ne 1/2$, the null binomial distribution is not symmetrical and it is not always obvious how to find boundaries so that the two tail probabilities of the two-sided test total to just less than 5%, but not more. The normal approximation can be seen as ignoring this difficulty rather than solving it.