Question about test function

87 Views Asked by At

I'm reading this book and on page 66, the author gives this example about hypothesis testing:

I don't understand why the author has defined the test function in this way. Why did he choose $\phi(8)=67/75$? Why the size is exactly 0.05?

3

There are 3 best solutions below

14
On BEST ANSWER

It's a randomized test... it is not very common. Scrolling the various k's in your $Bin(10;0.5)$ you will never get exactly $\alpha=5\%$, thus you reject with probability 1 if $X>8$ and with probability $p$ if $X=8$

How to calculate $p$?

$$1\cdot\mathbb{P}[X=10]+1\cdot\mathbb{P}[X=9]+p\cdot\mathbb{P}[X=8]+0\cdot\mathbb{P}[X<8]=0.05$$ That is

$$0.0009765625+0.0097656250+p\cdot 0.0439453125=0.0500000000$$

$$p=0.8933333333=\frac{67}{75}$$

In a Real Statistics work, when you know that $X=8$ has a $p_{value}=5.47\%$ you have enough information to take your decision. Nobody is interested to get "exactly" $\alpha=5\%$

1
On

The rejection criterion is $\Pr_\theta[X \ge k_\alpha] \le \alpha$, and since under the null hypothesis, $X$ is binomial with $\theta = \frac{1}{2}$, this probability is given by the sum $$\Pr[X \ge k_\alpha \mid \theta = 1/2] = \sum_{x=k_\alpha}^n \binom{n}{x} \theta^x (1 - \theta)^{n-x} = \frac{1}{2^n} \sum_{x = k_\alpha}^n \binom{n}{x} = \frac{1}{2^{10}} \sum_{x = k_\alpha}^{10} \binom{10}{x} \le \alpha.$$ When we calculate this sum for various choices of $k_\alpha$, we get the table $$\begin{array}{c|c|c} k_\alpha & \Pr[X = k_\alpha \mid \theta = 1/2] & \Pr[X \ge k_\alpha \mid \theta = 1/2] \\ \hline 10 & \frac{1}{1024} & \frac{1}{1024} \approx 0.000976563 \\ 9 & \frac{10}{1024} & \frac{11}{1024} \approx 0.0107422 \\ 8 & \frac{45}{1024} & \frac{56}{1024} \approx 0.0546875 \\ 7 & \frac{120}{1024} & \frac{176}{1024} \approx 0.171875 \\ \vdots & & \\ \end{array}$$ and the rightmost column are the values given in the problem. The problem then is that the rejection criterion $X \ge 8$ cannot be used because the resulting Type I error is slightly too large (by about $0.0046875$) for a test of size $\alpha$; but the criterion $X \ge 9$, while resulting in a level $\alpha$ test, would have less power to reject the null than a test of size $\alpha$, because in some cases, when the truth is $\theta > 1/2$ and we should reject the null, we cannot because we might observe only $X = 8$ and the criterion requires $X \ge 9$.

So the remedy that is proposed is to employ a randomization step to determine when to reject the null when $X = 8$, in such a way that the size of the test is exactly equal to $\alpha$. To do this, the example proposes to reject with some probability $p < 1$ when $X = 8$. This is the "flip a coin" scenario they describe. Now, it makes sense that, because the rejection region $X \ge 8$ is only slightly larger than $\alpha$, when we observe $X = 8$, we should be okay to reject the null most of the time, but not all of the time. In other words, $p$ should be closer to $1$ than to $0$. To determine exactly what value $p$ should be so that the Type I error is exactly $0.05$, we need to go back to the table and observe that the rightmost column is formed by cumulatively adding up the second column--i.e., this is a running total. So the required value of $p$ is the one for which when we add $p$ times the probability of $X = 8$ to the running total, we get $0.05$; that is to say, $$\frac{1}{1024} + \frac{10}{1024} + \frac{45}{1024} p = 0.05 = \frac{1}{20}.$$ Solving this gives the claimed $$p = \frac{67}{75}.$$ So when we see $X = 8$, we flip that coin and reject the null with probability $67/75$, and the resulting Type I error is exactly $0.05$.

However, this sort of procedure, while theoretically sound, certainly should cause the statistician some degree of discomfort. After all, the sample contained all the evidence that we had about $\theta$, and that sample does not change within a given experiment. Yet, depending on the outcome of a random coin toss, one that has nothing to do with the underlying process that generated the sample itself, we make a decision about whether to reject the null. This hardly seems "reasonable" from the perspective of making a statistical inference. Another way to understand this is, if you conducted the experiment for $n = 10$ and found $X = 8$, and I conducted my own experiment with $n = 10$ and also found $X = 8$, you might, on the basis of the same evidence, conclude that $\theta > 1/2$ but I might not make that conclusion because my coin toss outcome was not the same as yours. Yet the coin has nothing to do with the experiment that generated the outcome $X = 8$ in both cases. This is why we do not see such randomized testing procedures commonly performed.

4
On

considering the several question you are putting, I suppose that you are a little bit confused with this kind of problem. I try to explain you how to proceed in this case, trying to answer to all your questions.

First of all, considering the very well explanation of @heropup about uselessness of the randomized test, let's solve the following problem in a more commonly non - randomized test. If yuo want you can anyway randomize it...

The problem: I played a "fair coin game" with a friend. I chose Head he Tail. We tossed "his fair coin" and bet $\$1,000$ ad any toss and these were the results

$$\{H-T-T-T-T-H-T-T-T-T\}$$

Result: in the time of a minute I loss 6 thousand dollars!!

The doubt:

Friend or not, I am wondering to understand if this was really a fair coin or it was biased in favour of Tail (he chose both Tail and the Coin)

The solution:

First of all I get a drawing of the binomial $Bin(10;0.5)$ that is the distribution of Tails under the Null hypothesis that my friend is not a robber...

enter image description here

As you can see, the probability of getting 8 or more tails in 10 tosses, is about $5.5\%$

This is the Type I error, the probability to reject $H_0$ in favour of a One tail hypothesis $H_1$ that the coin is biased in his favour.

You can randomize the test to get exactly 5% and not 5.47% as in the example but you do not get any plus-information. The test is "bordeline significative" but it is not highly significative...if you state a Type I error of 1%, your pvalue of 5.47% is not enough to claim your money back...

Now suppose we fix an alternative hypotesis of $H_1: \theta=0.8$ and let's see what happens to our drawing

enter image description here

The yellow bars represent the probability to reject $H_0$ with the random sample above stated but under the alternative $H_1$: it's the Test Power:$68\%$.

The conclusion

The test is bordeline and not highly significative so I remain with my doubt because I'm not 99% sure of the robbery. Probably next time I will spend my $\$6,000$ for a Cruise to the Caribbean and not for gambling.


In a way I prefer to approach this problem (Bayesian approach), I assume a Uniform prior and evaluate

$$\frac{\int_0^{1/2}\theta^8(1-\theta)^2d\theta}{\int_{1/2}^1\theta^8(1-\theta)^2d\theta}=\frac{67}{1981}$$

Being the denominator greater than the numerator, my decision is in favour of the denominator, that is $\theta>\frac{1}{2}$