"Practical" Claim about Hypothesis Testing of Bernoulli Distribution Parameter

789 Views Asked by At

First, let me state the original problem (in my own wording):

Describe the decision procedure for testing the hypothesis about the parameter $p$ (success rate) of a Bernoulli distribution. The hypotheses are

\begin{gather} H_0: p = p_0 \\ H_1: p \ne p_0 \end{gather}

where $p_0$ is a fixed number. If $Y = \sum_{i=1}^n X_i$, where $X_i \sim \text{Bernoulli}(p)$ are i.i.d., is available, describe the decision procedure based on $Y$ that will guarantee that the probability of type 1 error does not exceed $\alpha$. (Assume $n$ is large.)

The first solution without the assumption that $n$ is large:

Find two critical values $Y_{lower}$ and $Y_{upper}$ such that $P(Y_{lower} < U < Y_{upper}) = 1 - \alpha$, where $U \sim \text{Binomial}(p_0, n)$. (There are many possible choices.) We accept $H_0$ if $Y_{lower} < Y < Y_{upper}$.

Then if $n$ is assumed large and we are allowed to approximate the distribution of $Y$ with a normal distribution, the method simplifies to

Find two critical values $Z_{lower}$ and $Z_{upper}$ such that $P(Z_{lower} < U < Z_{upper}) = 1 - \alpha$ where $U \sim \mathcal N(0, 1)$. We accept $H_0$ if $Z_{lower} < Z < Z_{upper}$, where $Z = \sqrt{n}\frac{(Y/n) - p_0}{\sqrt{p_0(1-p_0)}}$.

This simplifies the problem slightly if we take the symmetric interval around the mean, i.e., $Z_{lower} = -Z_{upper}$.

Here comes my question:

I was told by a teacher that in practice, some people use $$ \tilde Z = \sqrt n \frac{(Y/n) - p_0}{\sqrt{(Y/n)(1 - (Y/n))}} $$ instead of $Z$. In other words, $p_0$ in the denominator of $Z = \sqrt{n}\frac{(Y/n) - p_0}{\sqrt{p_0(1-p_0)}}$ is replaced by $\hat p = \frac{Y}{n}$.

His explanation was that $\sqrt{p_0(1-p_0)}$ might not be representative of the true variance, and the sample variance $\sqrt{\hat p(1 - \hat p)}$ may be better. I don't think this reasoning is valid because we are testing the hypothesis!

However, after some reflection, I am starting to think that using $\tilde Z$ might not be such a wrong thing because we did employ the central limit approximation, and using $\tilde Z$ might correct the approximation in a proper way. Of course this would somewhat contradict the assumption that the problem allows you to use normal approximation, but is this kind of correction (if it's really a correction) valid to some degree?

Why I think it might be correction in the right direction: Suppose $\frac 12 < \hat p < p_0$. Then $\sqrt{\hat p(1 - \hat p)} > \sqrt{p_0(1 - p_0)}$, so $Z < \tilde Z < 0$, making it more probable to accept $H_0$ using $\tilde Z$ than $Z$. On the other hand, if $\frac 12 < p_0 < \hat p$, it becomes less probable to accept $H_0$ using $\tilde Z$.

3

There are 3 best solutions below

5
On BEST ANSWER

When one is constructing a hypothesis test, you have to find a way to balance Type I error and Type II error the best. The problem is that Type I error and Type II error are usually inversely related (as in the case of this problem) which means that if you want decrease probability of Type I error in test you are going to increase Type II error. Thus the way we construct hypothesis tests is that we have threshold of how much probability Type I Error we will allow (since it is impossible to get it to 0), this threshold is called the significance level. From there, we can construct a test with that significance level, but has most power (i.e. power=$1-P(\textrm{Typer II Error)}$). In general as in this case, you look at test that actually achieve significance level since these test will have more power than ones that achieve lower than significance level

Now as you know when we do hypothesis testing we assume the null hypothesis is true (innocent until proven guilty) and that our test should have a probability of rejecting the null hypothesis when the null hypothesis is true equal to our significance level (i.e. $P$(Type I error)$\leq\alpha$). Now this is why $Z$ usually performs better than $\tilde Z$. Remember our highest priority is to ensure that $P(\textrm{Type I error})\leq\alpha$. Well if null hypothesis is true and we have a large enough sample size, the variance will be exactly $Var(\frac{Y}{n})=\frac{p_{o}(1-p_{o})}{n}$ instead of approximately $\frac{\hat p(1-\hat p)}{n}$. Thus since $Z$ relies on less approximations that $\tilde Z$ the $P$(Type I Error) will be closer to actually being our significance level.

The reason we even bring up $\tilde Z$ is because its a general test for sample average (which ends being a proportion in Bernoulli/Binomial situation), but since you can find exact sample variance when null hypothesis is true its not the best test to use.

Here are some resources also about Wald and Score Tests http://ocw.jhsph.edu/courses/methodsinbiostatisticsii/PDFs/lecture18.pdf http://www.biostat.umn.edu/~dipankar/bmtry711.11/lecture_02.pdf

7
On

For hypothesis testing purposes, you always use the actual null distribution, which requires no estimation. Therefore, I am suspicious of your teacher's response.

However, your teacher may have been referring to how confidence intervals are formed, in which case, you do use estimates. Your case appears to look like the beginnings of the basic "textbook" CI for the sample proportion.

However, a basic simulation of such an experiment (Monte Carlo) will show that the "practical" Z score is inferior to the normal approximation in terms of Type I error. For example, I simulated an experiment involving the sum of 50 bernoulli(p) rvs. I used the 1.96 cutoff for two sided rejection regions and then used both the "typical" and the "practical" Z values. The latter one gave a Type I error rate of 7% whilst the former only had a 4% Type I error rate, clearly better.

From a theoretical standpoint, we can compare the variance of the two Z scores:

$Var(Z)=Var\left(\sqrt{\frac{n}{p(1-p)}}(\frac{Y}{n}-p)\right)=\frac{n}{p(1-p)}Var(\frac{Y}{n})=\frac{1}{np(1-p)}(np(1-p))=1$

Now, let $\tilde Z_n=\sqrt{n}\frac{Y/n-p}{\sqrt{Y/n(1-Y/n)}}$

Asymptotically, $Var(\tilde Z_n)\xrightarrow{n\rightarrow \infty}1$ due to the consistency of the estimator.

However, note that: $\forall (n<\infty)\;P(Y=n \cup Y=0)>0$ which implies that $|Var(\tilde Z_n)|=\infty\;\forall(n<\infty)$ Hence, your teacher's "practical" method has an undefined variance (in fact, I had to "rig" my simulations to conservatively treat undefined ratios as an non-rejection in order for the simulations to not run into NAN errors!)

6
On

Yes, using $\widehat p=\frac{Y}{n}$ instead of $p_0$ is valid (but your treacher's reason I think is incorrect). The reason that it is valid to use $\widehat p$ instead of $p_0$ is because the asymptotics are the same i.e. the hypothesis test is based on $$\sqrt{n}(\widehat p - p_0) \stackrel{d}{\rightarrow}N(0,p_0 (1-p_0))$$ By the continuous mapping theorem (i.e. if $z_n \stackrel{d}{\rightarrow} z$ and $g$ is some continuous function then $g(z_n) \stackrel{d}{\rightarrow} g(z)$) and Slutsky theorem (i.e. if $z_n \stackrel{d}{\rightarrow} z$ and $y_n \stackrel{p}{\rightarrow} y$ then $z_ny_n \stackrel{d}{\rightarrow} zy$ ), $$\frac{\sqrt{n}(\widehat p - p_0)}{\sqrt{p_0 (1-p_0)}} \stackrel{d}{\rightarrow}N(0,1)$$ and $$\frac{\sqrt{n}(\widehat p - p_0)}{\sqrt{\widehat p (1-\widehat p)}} \stackrel{d}{\rightarrow}N(0,1)$$ Hence, asymptotically it doesn't make a difference if you use $p_0$ or $\widehat p$.

Two final points, (1) finite samples (as @Eupraxis1981 did in some simulations) may show that $p_0$ works better (especially if the true $p_0$ is far from 0.5), (2) there are many other estimators (in economics at least) that are based on $\sqrt {n}(\widehat \theta - \theta_0) \stackrel{d}{\rightarrow}N(0,V)$ but $V$ cannot be calculated using the null and you have no option but to estimate $V$ and use the test $\frac{\sqrt{n}(\widehat \theta - \theta_0)}{\sqrt{\widehat V}}$ (e.g. http://en.wikipedia.org/wiki/Generalized_method_of_moments).