Central Limit Theorem and Normal Distribution problem.

141 Views Asked by At

Suppose I have a sample of people of size $n$ in which the probability that one smokes is p. I am asked what n should be so that the proportion of smokers in the samples is, in approximation of 0.01, near p, with the probability of 0.95.

I saw one answer and didn't understand one of the infers:

Let $S_n$ represent the number of smokers. I look for n that satisfies: $P(|{S_n\over n}-p|\le 0.01)\ge 0.95$. Now it is equivalent to $P({0.01\sqrt{n}\over \sqrt{p(1-p)}} \le{S_n\over \sqrt{np(1-p)}}\le {0.01\sqrt{n}\over \sqrt{p(1-p)}})\ge 0.95$. But then it is said that n should satisfy: ${0.01\sqrt{n}\over \sqrt{p(1-p)}}\ge 1.96$. Why?, if so, then by the normal distribution table, I get that the probability isn't 0.95. What am I missing>? I would really appreciate any sort of help.

2

There are 2 best solutions below

1
On BEST ANSWER

You should get from the normal distribution table that $\int_{-\infty}^{1.96} \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx = \Phi(1.96)\approx 0.975$. This means that:

$P[-x \leq \frac{S_n}{\sqrt{np(1-p)}} \leq x] = \Phi(x) - \Phi(-x) = 2\Phi(x) - 1 = 0.95 \iff \Phi(x) = 0.975 \iff x = 1.96$

Summary: don't forget there are two tails. Also: it's easier to draw it than follow my ugly notation.

0
On

As described, it is a fair assumption that $S_n$ has a Binomial Distribution. Then, $\frac{S_n}n$ has a mean of $p$ and a variance of $\frac{p(1-p)}n$. Chebyshev's Inequality then says that $$ P\left(\left|\frac{S_n}{n}-p\right|\ge0.01\right)\le\frac{\frac{p(1-p)}n}{0.01^2}\tag{1} $$ Taking complements, we get $$ P\left(\left|\frac{S_n}{n}-p\right|\lt0.01\right)\gt1-\frac{\frac{p(1-p)}n}{0.01^2}\tag{2} $$ If we have $n\ge200000p(1-p)$, then $$ P\left(\left|\frac{S_n}{n}-p\right|\lt0.01\right)\gt0.95\tag{3} $$ If we don't know $p$, we can use that the maximum of $p(1-p)$ is $\frac14$. This means that $n\ge50000$ will assure that $(3)$ holds.