Estimating the number of books in the world from randomly chosen overlapping lists

52 Views Asked by At

Suppose I have lists $L_1 , \dots , L_n$ of, say, books. Assume further that these are uniformly chosen from the set of all books (probably unrealistic for obvious reasons, and if this assumption can be weakened, I'd be even more impressed). Suppose these lists are of size $k_1 , \dots , k_n$ and there are $a_{ij}$ books in common with lists $L_i$ and $L_j$ (that is, $|L_i \cap L_j| = a_{ij}$). How can we best use this data to estimate the total number of books in existence?

The mark and recapture method suggests a solution for the case when $n = 2$ as well as a worst-possible way of estimating the total number of books. Can we do better, though, since we have (supposedly) more than 2 lists?

1

There are 1 best solutions below

2
On BEST ANSWER

This is a binomial distribution problem where you're trying to deduce the true proportion $\vec{p}$ from the observed proportion $p$. Compile all your lists into one large dataset, fix some degree of confidence $d$ and construct a confidence interval for a proportion.