Estimation of the min sample size from the population to make the proportion within certain range with least probability $~0.99$

56 Views Asked by At

We want to estimate the proportion of consumers each of whom uses detergent of same brand.

$$\begin{align} p&:=\text{population proportion}\\ &0.8<p<0.9\\ \end{align}$$

We want to find out the minimum sample size $~ n ~$ which makes the difference between the sample proportion $~ \hat p ~$ and the population proportion$~ p ~$ less than or equal to $~ 0.02 ~$ , with least probability of $~ 0.99 ~$

The following is the official answer for the solution.

As sample size $~ n ~$ is large and the following approximate eqn is held,

$$\begin{align} \color{red}{\underbrace{Z={\hat p-p \over \sqrt{{pq \over n }} }\sim\mathcal N(0,1)}_{\text{How this emerged?} }} ~~~~~~~~(q:=1-p) \end{align}$$

the following is satisfied.

$$\color{red} {\underbrace{{P(|Z|<2.576)=0.99}}_{\text{What is this} } } $$

The following is the remnant for the soln.

$$\begin{align} P(\left| \hat p-p \right|<0.02 )&\geq 0.99\\ 2.576 \sqrt{{pq \over n }} &\leq 0.02\\ \therefore n&\geq \left({2.576 \over 0.02 } \right)^2pq\\ \operatorname{arg}\max_{p\in[0.8,~0.9]}(p(1-p))&=0.8\\ n\geq \left({2.576 \over 0.02 } \right)^2\times0.8\times0.2&\approx2654.3\\ \min(n)&=\operatorname{ceil}(2654.3)\\&=2655 \end{align}$$

I presume that the first red-marked eqn is one of typical formulas.

Can anyone tell me the name of it?

BTW the space between $~\operatorname{arg}~$ and $~\operatorname{max}~$seems large. I wonder if there some good way to shrink it.

2

There are 2 best solutions below

0
On BEST ANSWER

$n\hat{p}$ is the number of customers in your sample who use the detergent. It is a $\text{Binomial}(n, p)$ random variable which has mean $np$ and variance $npq$.

By the normal approximation to the binomial distribution, $n\hat{p}$ is approximately normal with mean $np$ and variance $npq$.

By shifting by the mean $np$ and scaling by the standard deviation $\sqrt{npq}$, we see that $\frac{n(p-\hat{p})}{\sqrt{npq}} = \frac{p-\hat{p}}{\sqrt{pq/n}}$ is approximately standard normal. This explains the first red equation.

The second red equation comes from reading a standard normal table or using a computer to compute the value $\alpha$ such that $P(|Z| < \alpha) = 0.99$. It turns out it is $\alpha\approx 2.576$.

Regarding typesetting "argmax", you need to define a custom command since Latex doesn't have a built-in command for it. See here.

0
On

The first equation marked in red comes from the Central Limit Theorem. The sample proportion $\hat{p}$ is a sample mean of the variable indicating whether a given person falls in the required category, so for sufficiently large sample size we can say that it is approximately normally distributed. The red equation represents standardising the statistic by subtracting its expected value and dividing by its standard deviation, which can be derived by noting that $n\hat{p}$ has a binomial distribution and applying known results.

The second equation in red comes looking up the value of $z$ for which $P(-z < Z < z) = 0.99$ when $Z \sim N(0, 1)$, which turns out to be 2.576.