Estimating sample size based on partial results?

50 Views Asked by At

Say I have a fair 100-sided die that I rolled an unknown number of times. I know that I rolled a 7 exactly 219 times. Intuitively, I know there were probably around 21900 rolls (since ~1% of them would be 7) - but how do I calculate a 95% confidence interval around the total number of rolls?

2

There are 2 best solutions below

0
On

Variance $\sigma^2=.01-.0001$. Standard deviation for a sample (no. of 7's) of size n is $\sigma\sqrt{n}\approx 1.48$. For 95% confidence you need approx. 2 deviations = 2.96. Therefore your original total sample had an interval of $\pm 219$

0
On

If $N$ is the total number of rolls, then $M = N - 219$ is the number of non-7 rolls, and $M$ is distributed as a negative binomial distribution, so we would write $M \sim NB(219, 0.01)$.

The mean of $M$ is $\frac{219 \times 0.99}{0.01} = 21681$, and its variance is $\frac{219 \times 0.99}{0.01^2} = 2168100$, so its standard deviation is $\sqrt{2168100} \approx 1472$.

Now, since the negative binomial can be expressed as a sum of iid geometric distributions, we can apply the central limit theorem and generate a confidence interval based on approximating it as a normal distribution, which would give us a 95% confidence interval of something like $(18795, 24567)$.