Probability - Interview Question - Hidden Assumptions and Phrasing Issues

Question

Probability - Interview Question - Hidden Assumptions and Phrasing Issues

616 Views Asked by Bumbble Comm At 27 Mar 2026 - 10:06

I’ve encountered the following seemingly simple probability interview question in my workplace:

Two reviewers were tasked with finding errors in a book. The first had found 40 errors and the other had found 60. 20 of the found errors were found in common. Give an estimate on the number of errors in the book.

A few clarifications were given:

The errors are not false positives.
The probability of the reviewers to find any error is independent of each other. (Problematic phrasing?)
The lower bound is not required (i.e at least 80 errors).

It was my opinion that this problem is not well defined and any answer would rely on hidden assumptions.

My coworker said that the solution is easily calculable using the following method assigning to x the total number of errors:

$$P(A) = \frac{40}{x}$$ $$P(B) = \frac{60}{x}$$ $$P(A\cap B) = \frac{20}{x}$$ $$P(A\cap B) = P(A) * P(B)$$ $$\frac{20}{x} = \frac{40}{x} * \frac{60}{x} $$ $$20x = 2400$$ $$x = 120$$

I found this answer unsatisfying, but I am struggling to coherently explain why. I believe there are various assumptions hidden in the above “solution”.

I need help identifying these assumptions or phrasing issues with the question itself that make it not well defined. It could be that I’m mistaken and the problem is well defined and I’ve complicated it.

I am also interested in alternative solutions that could be based on different assumptions but don’t negate the clarifications made.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 26 Dec 2023 - 10:20

You have good reason to suspect this analysis. An obvious way to see why this estimate cannot be appropriate is to observe that if $0$ errors are found in common between the two reviewers, this would imply $$\frac{0}{x} = \frac{40}{x} \cdot \frac{60}{x},$$ and the estimate is $x = \infty$ errors. This is obviously absurd. The number of errors might be very large, but it cannot be infinite even when there is a small but nonzero probability that no common errors are found. For instance, in a book with $x = 10$ errors, if reviewer $A$ finds $2$ errors and $B$ finds $3$ errors, it is quite reasonable to think that none of those errors are common.

If I were to take the time to solve this question, I'd first state some additional but reasonable assumptions:

Within each reviewer, each error has the same fixed probability of being discovered.
Within each reviewer, the probability that a given error is found is independent of any other errors.

Such a model would involve a binomial and/or hypergeometric distribution approach, and estimating $x$ would be done by maximum likelihood.

If we do assume such a model, then what is the probability of the aforementioned outcome: $x = 10$ errors, but $N_A = 2$ and $N_B = 3$ are found, and $N_C = 0$ common errors?

If $x$ is unknown, what is the corresponding likelihood function for $x$ in the above case?

**Bumbble Comm** · Accepted Answer

Let $A_i \thicksim Ber(p)$ be a random variable describing whether or not person $A$ found error $i$, and $B_i$ be the same but for person $B$. The answer posted assumes that $\forall i,j; \mathbb{P}(A_i = 1) = \mathbb{P}(A_j = 1)$, which doesn't feel right. For example, if the errors are typos then the typo: "dwjaiodajwio" is more obvious than using "there" instead of "their". We also should consider types of error, maybe person $B$ is better at finding grammatical error than person $A$, but person $A$ can find all of the spelling errors.

If we choose to assume this, then $\mathbb{P}(A_i = 1) = \frac{40}{x}$ is still incorrect. Let $A \thicksim \text{Bin}(x, \frac{40}{x})$ and $B \thicksim \text{Bin}(x, \frac{60}{x})$. Then we expect $A = 40$ and $B = 60$ given $x$ total errors, but this is of course the expectation, not on any given trial will they be equal. That is the biggest problem here, is that we claim this trial to be equal to the expectation.

The answer given has assumed that the true expectation is equal to the number of errors found (i.e. $A = x \cdot \frac{40}{x} = 40 = \mathbb{E}[A]$). That is the big "hidden assumption" that the answer has without saying. On just one trial, it is ridiculous to assume this, and the other answer from heropup showed an example as to why this becomes a problem if we find $0$ in common. You are certainly correct that this is not a well-defined problem, and it should have these things specified to make sense.

It would be hard to get an estimate on the true probability, since we don't know the number of true errors or how the errors work. In other words, if we had a disease over a country, and we knew there were at least $100$ people sick out of some amount of people, it's hard to estimate the number of sick people when we know literally nothing about the disease. It could be exactly $100$ if the disease is a rare genetic condition, or it could be $100,000$ if the disease was like the common cold, we don't know, and no estimate will exactly feel satisfactory, since we would need major assumptions on the data.

Final edit: What if they found $40$ error's in common, and $A$ still found $40$ while $B$ still found $60$? Then it seems like we should expect there to only be $60$ error's based on the work at hand, but that makes literally no sense to just assume $B$ is perfect.

Probability - Interview Question - Hidden Assumptions and Phrasing Issues

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in WORD-PROBLEM

Related Questions in PHILOSOPHY

Trending Questions

Popular # Hahtags

Popular Questions