Estimating the number of books in the world from randomly chosen overlapping lists

52 Views Asked by Bumbble Comm At 01 Apr 2026 - 11:59

Suppose I have lists $L_1 , \dots , L_n$ of, say, books. Assume further that these are uniformly chosen from the set of all books (probably unrealistic for obvious reasons, and if this assumption can be weakened, I'd be even more impressed). Suppose these lists are of size $k_1 , \dots , k_n$ and there are $a_{ij}$ books in common with lists $L_i$ and $L_j$ (that is, $|L_i \cap L_j| = a_{ij}$). How can we best use this data to estimate the total number of books in existence?

The mark and recapture method suggests a solution for the case when $n = 2$ as well as a worst-possible way of estimating the total number of books. Can we do better, though, since we have (supposedly) more than 2 lists?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 25 Dec 2015 - 8:05 BEST ANSWER

This is a binomial distribution problem where you're trying to deduce the true proportion $\vec{p}$ from the observed proportion $p$. Compile all your lists into one large dataset, fix some degree of confidence $d$ and construct a confidence interval for a proportion.

Estimating the number of books in the world from randomly chosen overlapping lists

There are 1 best solutions below

Related Questions in COMBINATORICS

Related Questions in ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions