Determine Population Size from sample.

61 Views Asked by At

I have a set, of an unknown size $N$. I can cheaply query a random item from this set. I need to estimate $N$ to within a given confidence range.

I have already setup a structure where I can query an item, determine if it has occurred before, and update my overlap counter. My issue is how do I take a sequence of occurrences counts, and infer the set size they are sampled from.

To give a real example:

Consider Set $s$, $s:=\{1, 2, 3, 4, 5, 6\}$.

8 Random Samples yielded $2, 2, 4, 5, 6, 2, 3, 1$. This sequence is processed into this map:

2 -> 3,
4 -> 1,
5 -> 1,
6 -> 1,
3 -> 1

How could I infer set size from this map?

My research has yielded this formula, for inferring population size from discrete random samples: $$ P\left(N\ |\ {s}_1,{s}_2,o\right)\propto P\left(\ o\ |\ N,{s}_1,{s}_2\right)\times P(N). $$ from this paper, but this formula involves seperate random samples, and I cannot adapt it to handle one sample.

I also had difficult finding answers that gave confidence intervals. As the solution will ultimately be used in an algorithm to determine total size, I plan to simply continue running until my confidence falls into a given range.

Any help is appreciated! I'm half sure there's just some statistical law that everyone but me knows that answers this question.