A researcher takes a sample size of $n$ people out of a population to check the proportion of smokers $p$. What is the required sample size to guarantee with probability of $0.95$ that the proportion of smokers wont be away from $p$ by more than $0.01$ (Note: The researcher doesn't know what is the value of $p$)?
I was able to get an approximation using Chebyshev's inequality, of $n\geq 50000$, but when trying to get a tighter bound using CLT I'm stuck.
If I let $S_n$ be the number of smokers in $n$ people, and assume that each person smokes independently of the others with probability $p$, then I want to find
$$P\left(\left|S_{n}-np\right|\leq0.01n\right)$$
which can be written in "CLT form" as
$$P\left(\frac{-0.01n}{\sqrt{np\left(1-p\right)}}\leq\frac{S_{n}-np}{\sqrt{np\left(1-p\right)}}\leq\frac{0.01n}{\sqrt{np\left(1-p\right)}}\right)$$
which then tells me that this is approximately
$$2\phi\left(\frac{0.01n}{\sqrt{np\left(1-p\right)}}\right)-1$$
but herein lies my problem, as $p(1-p)\to 0$, meaning the smaller (or larger) $p$ is, the larger I would require $n$ to be, and I can't just bound $p(1-p)$ from below (unlike Cheyshev's where I bounded it from above). What is the right way to continue?
Edit: The version of CLT we were given is that if $S$ is the sum of independent, equally distributed, random variables with mean $\mu$ and variance $\sigma^2$, then
$$P\left(\frac{S-n\mu}{\sqrt{n\sigma^2}}\leq b\right)\approx \phi(b)$$
$\phi$ being the normal distribution.
By your calculation, we want the displayed probability to be $\ge 0.95$. So, using the normal approximation to the binomial, we see that we want $$\frac{0.01\sqrt{n}}{\sqrt{p(1-p)}}\gt 1.96,$$ or equivalently $$n\gt \frac{(1.96)^2}{(0.01)^2}p(1-p).$$ Note that in the interval $(0,1)$, the function $f(x)=x(1-x)$ reaches a maximum of $\frac{1}{4}$ at $x=\frac{1}{2}$. So the maximum possible value of $p(1-p)$ is $\frac{1}{4}$, and therefore we can take $$n\approx \frac{(1.96)^2}{4(0.01)^2}.$$
Remark: Informally, if the variance is small, then the required sample size is small. So we need to bound $p(1-p)$ from above, not from below.
Maximum variance is reached at $p=\frac{1}{2}$. It turns out that $p(1-p)$ is remarkably close to $\frac{1}{4}$ as long as $p$ is not too far from $p=\frac{1}{2}$, so taking the pessimistic view of the size of the variance often does not make much of a difference.
If $p$ is far from $\frac{1}{2}$, then the $n$ we calculated is larger than necessary. But still, with probability $\gt 0.95$, we will get at least the desired accuracy, so we can provide the required guarantee.