Counting coupons in a box

96 Views Asked by At

I have a question with regards to counting. I come from a background in quantum computing, so I'm not familiar with the relevant literature here. For the interested reader, I provide below more details about the quantum computing background and how I arrived at this problem.

The "counting" question I want to solve is: Suppose a box has $N$ distinct coupons numbered $1,2,\dotsc,N$. Each turn, I randomly pick one coupon from the box, look at the number on the coupon, and then put it back in the box. My goal is to estimate the total number $N$. Approximately how many times should I draw a coupon from the box and then put back, such that I get a reasonably good estimate of $N$ with a reasonably high probability? If you want to be rigorous, after $M$ turns, what is the probability $$P\left(\frac{|N_{\text{estimate}}-N|}{N} <\epsilon\right)?$$

Now, for the background: quantum annealers are used to reach the ground state of a desired Hamiltonian. I want to count the number of ground states of this Hamiltonian. I do this by repeatedly measuring the wave function, which is like randomly picking a coupon from a box, as in the above problem. I have ideas to ensure that each coupon is picked with equal probability (called "fair sampling").

2

There are 2 best solutions below

2
On

I will only consider the estimator that takes the largest number you have seen so far. As mentioned in the comments, one might consider something slightly larger than this estimator, since we know it always underestimates the maximum (e.g., correct the bias of this estimator). Which estimator you choose will depend on your context (do you incur a severe penalty if you overestimate $N$?).

The probability that your guess is $\le N-n$ (for $0 \le n \le N$) is the same as the probability that you never draw $N-n+1, N-n+2, \ldots, N$ in the $M$ turns. $$P(N_{est} \le N-n) = \left(1 - \frac{n}{N}\right)^M.$$ Rearranging yields $$P\left(\frac{N - N_{est}}{N} \ge \frac{n}{N} \right) = \left(1 - \frac{n}{N}\right)^M.$$

0
On

It seems that my question is closely related to the capture-and-recapture problem, as pointed out by Ross Millikan: Suppose a forest has a population $N$ of one animal species, how many animals should I capture to estimate $N$? The solutions goes as follows: On my first field visit, say I tag M individuals at random. On my second field visit, I capture M individuals at random, and find that k of these have already been tagged. Then, I can estimate $\frac{k}{M} = \frac{M}{N} \Rightarrow N=\frac{M^2}{k}$. For this to work, $k\geq1 \Rightarrow M\geq\sqrt{N}$. That is, I can estimate $N$ by tagging only $\sim\sqrt{N}$ individuals.

This is the answer I was looking for, and I separately obtained the same answer using a more circuitous route. It is helpful to have my intuition confirmed, nevertheless, it is surprising that I can estimate $N$ in only $\sim\sqrt{N}$ time.

I think I have a handle on answering the further question, "What is the probability that this estimate is the right answer with relative accuracy $\epsilon$?". Thank you all!