Proofreading: Estimate the number of misprints continuously when to stop

45 Views Asked by At

In the book Introduction to probability and statistics for engineers and scientists, the author mentioned that to estimate the number of misprints $N$ on a manuscript based on the errors found by $m$ proofreaders ($m >= 3$), we can perform the following steps (with the assumption that each error is independently found by each proofreader with the probability $p_{i}$, $i = 1, 2, ...m$):

Let $n_{f}$ be the number of errors that are found by at least one proofreader. Because $\frac{n_{f}}{N}$ is the fraction of errors that are found by at least one proofreader, this should approximately equal $1 - \prod_{i=1}^{m}(1-p_{i})$, the probability that an error is found by at least one proofreader.

Therefore, we have $\frac{n_{f}}{N} = 1 - \prod_{i=1}^{m}(1-p_{i})$,

suggesting that $N \approx \hat{N}$ , where $\hat{N} = \frac{n_{f}}{1 - \prod_{i=1}^{m}(1-p_{i})}$             (7.2.1).

With this estimate of $N$, we can then reset our estimates of the $p_{i}$ by using

$p_{i} = \frac{n_{i}}{\hat{N}}$, $i = 1,...,m$             (7.2.2).

We can then reestimate $N$ by using the new value (Equation 7.2.1). (The estimation need not stop here; each time we obtain a new estimate $\hat{N}$ of $N$ we can use Equation 7.2.2 to obtain new estimates of the $p_{i}$, which can then be used to obtain a new estimate of $N4, and so on.)

So my question is when to know to stop the estimation, like is there any threshold or metric to estimate the accuracy?

Update: Or I guess, maybe the estimation always converges on a large number of estimatates.