Which is the right formula for margin of error in Hypothesis test for mean with known / unknown variance?

97 Views Asked by At

I'm preparing for an exam and I think there might be a mistake in my notes, since I find inconsistencies when I try to solve some of the problems.

It's about the formulas to find $n$ (sample size) or $b$ (Margin of Error):

When we have Hypothesis test for one μ with known or assumed variance do we use:

$$n \geq \left(\frac{z_{\alpha/2}\cdot \sigma}{b}\right)^2$$

and

when we have Hypothesis test for one μ with unknown variance is it: $$n\geq \left(\frac{t_{n-1; \alpha/2}·s}{b}\right)^2$$

or I've switched them around?

Also, when I'm given information in the text only about the standard deviation but no information on the variance, do I use the Hypothesis test for unknown variance (since we're not explicitly told about it) or the other one for known/assumed variance (since I can find the variance from the standard deviation)?

1

There are 1 best solutions below

2
On BEST ANSWER

My suggestion in stats is that you try not to memorize formulas but understand how to get them from the fundamentals. That will make things less stressful in the long run.

Both the Z-test and t-test assume we are sampling from a normal population (although they are used much more broadly as approximations).

Let $X_1, X_2,....,X_n$ be $n$ iid Normal random variables with mean $\mu$ and standard deviation $\sigma$.

The goal of our inference is to estimate $\mu$ with confidence $\alpha$ and margin of error $b$ from the sample mean $\bar{X}:=\sum_1^n \frac{X_i}{n}$.

You may have learned this already, but the nice thing about Normal random variables is that sums of random variables also have a Normal distribution -- you just add the means. For the variance, assuming they are independent, we can just add the variances as well. Pretty nice.

One other fact that you may have learned is the algebra of expectations and variances:

$$E[aX] = aE[X], V[aX]=a^2V[X]$$

This means that we also know the distribution of the sample mean in this case:

$$\bar{X}:=\sum_{i=1}^n \frac{X_i}{n}=\frac{1}{n} \sum_{i=1}^n X_i \sim N\left(\frac{\mu n}{n}, \frac{n\sigma^2}{n^2}\right)=N\left(\mu, \frac{\sigma^2}{n}\right)$$

Therefore, the sample mean is also normally distributed with the same mean as the population, but its variance is $n$ times smaller.

The 2-sided $1-\alpha$ confidence interval for $\mu$ has half-width (i.e., margin of error $b$) of $b= z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$. If we want fix $\alpha$ and $b$ we can solve for $n$:

$$ \sqrt{n}= z_{\alpha/2}\frac{\sigma}{b} \implies n = \left(z_{\alpha/2}\frac{\sigma}{b}\right)^2$$

If we know $\sigma$ then we are done.

Now, if we don't then we have to estimate it from the data, in which case we are using the sample standard deviation $s$ not $\sigma$. In this case , we have to use the t-distribution to take into account the extra variability form estimating $\sigma$ (i.e., our T-statistic $\frac{\bar{X}-\mu_0}{s}$ will have a t-distribution, not a normal distribution).

So from the above, you can see that you are correct in which to use for known and unknown variance.