Estimating the number of coin flips, knowing the probability $p$ getting head and the number of heads

882 Views Asked by At

Let $N$ be an unknown variable that we wish to estimate.

Assume that we know that a coin with bias $p$ was tossed $N$ times, and the outcome was $H$ heads. When both $p$ and $H$ are known, we can estimate $N$ to be $\widehat{N} \triangleq H/p$, but how do we derive confidence intervals for it?

I.e., given a parameter $\delta\in(0,1/2)$, how to find the minimal interval $[N_{min}, N_{max}]$ such that $$\Pr[N\notin [N_{min},N_{max}]]< \delta\ \ ?$$

1

There are 1 best solutions below

2
On

You know $p$ and $q = 1-p$ and you have seen $X$ heads in an unknown number $N$ of tosses of a coin with success probability $p.$ Your point estimate of $N$ is $\hat N = X/p.$

If your $\delta = .10,$ you are essentially asking for a 90% confidence interval for $N$ based on $\hat N = X/p.$ For $p$ and $N$ such that $Np$ and $Nq$ both exceed 5, the normal approximation to the binomial should be useful.

The estimator $\hat N$ is unbiased: $$E(\hat N) = E(X/p) = np/p = n,$$ where $n$ is the true sample size. Also, $$Var(\hat N) = Var(X/p) = Var(X)/p^2 = nq/p.$$

So the standard error (standard deviation of the estimator) is $SD(\hat N) = \sqrt{nq/p}.$$

Thus an approximate 90% confidence interval for $n$ is of the form $$\hat N \pm 1.645\sqrt{\hat Nq/p},$$ because $\pm 1.645$ cut 5% from the upper and lower tail of the standard normal distribution.

For the particular case where $X=100$ and $p = 1/3,$ this amounts to $300 \pm 1.645(24.495)$ or $300 \pm 40.3,$ which is the interval $(259.7, 340.3).$

It is fair to ask how well the normal approximation works in this case and whether it is useful to estimate $n$ by $\hat N$ in computing the standard error.

The following simple simulation in R statistical software provides a reality check. Based on the predicted $\hat N = 300,$ we generate a million new values of N and make CIs for each. Nearly 90% of the new CIs cover the original $\hat N = 300$, and the new $\hat N$s have the average and SD predicted by our discussion above. A histogram (not shown) of the new $\hat N$s shows an essentially normal shape. Furthermore, viewing quantiles .05 and .95 of the million new $\hat N$s are about 260 and 340, respectively.

p = 1/3; q = 1-p;  x = 100             # known and observed
N.hat = x/p                            # resulting point estimate
m = 10^6;  x.new = rbinom(m, n.hat, p) # simulate 10^6 new X-values based on N.hat
N.new = x.new/p                        # 10^6 new estimates                        
se.N = sqrt(N.new*q/p)                 #   and standard errors
lcl = N.new - 1.645*se.N               # 10^6 new CIs, lower limits
ucl = N.new + 1.645*se.N               #   and upper limits
mean(lcl < N.hat & ucl > N.hat)
## 0.899387                            # 90% of CIs cover our orig'l N.hat
mean(N.new);  sd(N.new)                # mean and std err closely match theory
## 300.026
## 24.4969