Minimum sample size for achieving the desired margin of error

203 Views Asked by At

When trying to find a confidence interval for an unknown population mean, we can achieve a desired margin of error by ensuring that our sample size is large enough. The textbook I'm using gives the following formula:

enter image description here

Now, we use a z-distribution instead of a t-distribution because specifying a t-distribution requires us to know the degree's of freedom, and thus the sample size. With that in mind, it's a bit strange to me that the we are using a sample standard deviation here.

Many internet resources state this formula as using the population standard deviation. I understand that if we don't know the population standard deviation then we use $s$ to estimate it; the thing that throws me off a bit is that the sample size with which we used to compute $s$ does not appear in the formula!! If we use a certain sample to compute s, then we might as well use the corresponding t-distribution, no?

So basically I'm confused about using a z-distribution instead of a t-distribution even though we are using a sample standard deviation... insight appreciated!

1

There are 1 best solutions below

0
On BEST ANSWER

The sample size used to compute $\ s\ $ does appear in the formula, because $$ s=\sqrt{\frac{1}{(n-1)}\sum_\limits{i=1}^n\left(X_i-\frac{1}{n}\sum_\limits{j=1}^nX_j\right)^2}\ , $$ where $\ X_1,X_2,\dots,X_n\ $ are the values of the samples.

The formula $\ n\ge\left(\frac{t_{n-1,\alpha/2}s}{E}\right)^2\ $ is rigorously true (in the sense explained below), whereas $\ n\ge\left(\frac{z_{\alpha/2}s}{E}\right)^2\ $ is at best an approximation when $\ \left(\frac{z_{\alpha/2}s}{E}\right)^2\ $ is sufficiently large. This follows from the fact that the student $\ t$-distribution with $\ n\ $ degrees of freedom converges to the standard normal as $\ n\ $ tends to infinity. I can't see how either of these formulae could be of any use for obtaining a prior estimate of your required sample size, however, for the simple reason that you can't compute the value of $\ s\ $ until after you've taken the sample.

If the population is normally distributed with mean $\ \mu\ $, and $\ M_n\ $ is the sample mean, then $$ T=\frac{\sqrt{n}\big(M_n-\mu\big)}{s} $$ follows a student's $\ t$-distribution with $\ n-1\ $ degrees of freedom. Therefore $$ P\left(\big| M_n-\mu\ \,\big|\le\frac{t_{n-1,\alpha/2}s}{\sqrt{n}}\right)=1-\alpha\ , $$ and so $\ \left[{-}\frac{t_{n-1,\alpha/2}s}{\sqrt{n}},\frac{t_{n-1,\alpha/2}s}{\sqrt{n}}\right]\ $ is a $\ 100(1-\alpha)\% $ confidence interval for $\ M_n-\mu\ $. For the width of this interval to be less than $\ 2E\ $, you will need $\ n\ge\left(\frac{t_{n-1,\alpha/2}s}{E}\right)^2\ $.

There's one other possible route by which which your textbook's author might have arrived at the formula $\ n\ge\left(\frac{z_{\alpha/2}s}{E}\right)^2\ $. If $\ \sigma\ $ is the population standard deviation, then $$ Z=\frac{\sqrt{n}\big(M_n-\mu\big)}{\sigma} $$ follows a standard normal distribution. Therefore $$ P\left(\big| M_n-\mu\ \,\big|\le\frac{z_{\alpha/2}\sigma }{\sqrt{n}}\right)=1-\alpha\ , $$ and $\ \left[{-}\frac{z_{\alpha/2}\sigma}{\sqrt{n}},\frac{z_{\alpha/2}\sigma}{\sqrt{n}}\right]\ $ is a $\ 100(1-\alpha)\% $ confidence interval for $\ M_n-\mu\ $. For the width of this interval to be less than $\ 2E\ $, you will need $\ n\ge\left(\frac{z_{\alpha/2}\sigma}{E}\right)^2\ $. When $\ \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2\ $ is sufficiently large however, the sample standard deviation $\ s\ $ is likely to be a good approximation for $\ \sigma\ $, and so the inequality $\ n\ge\left(\frac{z_{\alpha/2}s}{E}\right)^2\ $ implies that the width of the $\ 100(1-\alpha)\% $ confidence interval is likely to be not much greater than $\ 2E\ $.