Hello all trying to do an estimation problem at work and wondering if I'm on the right track!
I'm running a study and its on the internet. I'm trying to determine how many people I need to show an advertisement to in order to have 80%-95% confidence that I reach half of those people. Here are the numbers: the population is 220,000,000 and the sample of people I'm trying to reach is 1000 people large so the question is how many advertisements will I need to show in order to be 80%-95% certain that I hit at least 500 of those 1000 people without replacement.
My first thought is that:
the sum of 1000/220,000,000 + 999/219,999,999 + 998/219,999,998..... = 0.2275003444% which is the probability of success of hitting all 1000 people without replacement
the sum of 500/220,000,000 + 499/219,999,999 + 498/219,999,998..... = 0.0569319906% which is the probability of hitting 500 of the 1000 people without replacement
I'm having trouble with the next step how do I estimate how many times I need to show an advertisement to those 220,000,000 people to ensure that with 80%-95% confidence that hit at least 500?
I cannot help but think this is now a binomial estimation problem and I need to set it equal to .80-.95 and solve for k? am I think right and if so have do you solve for k is that possible?
$(nCk)(p)^k (1-p)^{n-k} = .8$ solve for k
The population size of $220,000,000$ is irrelevant, the relevant population is the fixed sample/control group of $1000$.
On average, $\Bbb E(X)=1000p$ of these have seen the ad, with a variance of $\Bbb V(X)=1000p(1-p)$ according to a binomial distribution of the random variable $X\sim B(1000,p)$. To be absolutely precise, you would have to solve the the equation for the sum of the probabilites for $X=500,501,...,1000$ surpassing the target probability, $$ \sum_{k=500}^{1000}\binom{1000}{k}p^k(1-p)^{1000-k}=p_{target}\in[0.80,0.95]. $$ Since for $p=0.5$ this sum is $\approx0.5$ and for $p=1$ it is $1$, you can solve this via bisection or regula falsi or some kind of bracketed Newton in a small number of steps
However, the sample is large enough that you can approximate the binomial distribution $X\sim B(1000,p)$ by the normal distribution $X\sim N(1000p, 1000p(1-p))$ with the same expectation and variance. This can be related to a standard normal random variable $Y\sim N(0,1)$ via the linear transformation $$ X=1000p+\sqrt{1000p(1-p)}Y, \quad \text{or }Y=F(X,p)=\frac{X-1000p}{\sqrt{1000p(1-p)}}. $$ This approximation is good for $5\% < p < 95\%$.
Now you can work with the quantiles of the normal distribution. For $$ p_{target}=P(X\ge 500)=P(Y\ge -q)=P(-Y\le q)\in [80\%,95\%] $$ one needs $Y=F(X,p)\ge F(500,p) =-q$ with $q\in[0.85,1.65]$, where $q$ is the quantile of the standard normal distribution for the probability $p_{target}$, or $-q$ is the quantile for the probability $1-p_{target}$.
Obviously, $p>0.5$. So solve \begin{align} -q=F(500,p)=\frac{500-1000p}{\sqrt{1000p(1-p)}} &\iff 1000(1-2p)^2= q^2\,(1-(1-2p)^2)\\ &\iff (1000+q^2)(1-2p)^2= q^2\\ &\iff p= \frac12+\frac{|q|}{2\sqrt{1000+q^2}} \end{align} which gives a necessary coverage $p$ between $51,3\%$ and $52,6\%$
This computation gives you the cumulative probability that a person has seen one of the ads. For the extended problem, to get the required number of showings of the ad, the coverage estimation for a single showing is missing. If this were, for example $10\%$, then the probability to have seen at least one of $k$ showings is greater than $52\%$ (just another example value from the computed range) if $1−(1−10\%)^k≥52\%$, so you would have to calculate $$ k≥\log(0.48)/\log(0.9)=6.9662.... $$