Estimate the number of typos there are in a book, based on two editors' finds

556 Views Asked by At

This is one question from an interview I have just taken:

Suppose there is a book full of typos. Tom and Jerry found $x$ and $y$ typos throughout the book, respectively. There are $z$ typos that they both found.

The question is how to estimate the total number of typos in this book?

I find this question quite interesting but do not know how to deal with it. Can anyone give me a hint on this? Thanks!

3

There are 3 best solutions below

5
On BEST ANSWER

Assume Tom finds a typo with probability $p$ and Jerry with probability $q$. Let $T$ be the number of typos.

Then Tom finds $x=Tp$ typos, Jerry finds $y=Tq$ and the number of typos they expect to both find is $z=Tpq$. Solving for $p$, we get $p=\frac{z}{y}$. Then $T\approx\frac{x}{p}=\frac{xy}{z}$.

This assumes $z>0$, and it assumes the odds of the two finding a typo are independent (which is an unlikely assumption, but it might be a "good enough" model.)

0
On

What is the minimum number of typos possible? That is when they both found the same typos. For instance, suppose $x=3, y=4$. The least possible number of typos found would be when they overlap as much as possible, so that $z=3$ and all together they found 4 typos.

What is the maximum number of typos possible? That occurs when they found none of the same typos, so again if $x=3, y=4$ then we want $z=0$ so that they found $7$ typos together.

In general the relationship is that the number of typos they've found is $x+y-z$. To maximize this for a fixed $x$ and $y$, you minimize $z$. To minimize this, you maximize $z$.


So OK, now you know how many typos they found in the book together: $x+y-z$. If you want to estimate the total number of typos in the book, supposing that perhaps they both missed some number of typos, then you'll need to somehow quantify how many typos they might have overlooked. There are probably many ways to do this but one would be to suppose that for every $m$ typos they found, there were $n$ typos they didn't find. There probably isn't just one correct pair of numbers to use for this, but one plausible guess might be: Suppose that for Tom it's $x:(y-z)$ and suppose for Jerry it's $y:(x-z)$.

6
On

If I understood the question properly:

Let $A$ be the set of typos Tom found, $B$ the set of typos Jerry found and $C$ the set of typos they both found. Note that: $C = A \bigcap B$

The number of typos is: $card(A \bigcup B) = card(A) + card(B) - card(A \bigcap B) = card(A) + card(B) - card(C) = x + y - z$.