Detection probability when there are multiple issues

26 Views Asked by At

We are conducting multiple rounds of sampling tests, and I am trying to devise a way to perform limited sampling which can uncover all the issues that existed in the population of the previous cycle (full cycle) with a given level of certainty, assuming that no new issue exists in the population of the next cycle (limited cycle), and that issue occurrence in both cycles follow the same distribution.

As the input, testing results of the full cycle can be split into three scenarios.

  1. Only one issue detected
  2. Multiple issues detected, and each item has at most one issue
  3. Multiple issues detected, and some items have more than one issue

Symbolically, the problem can be represented as below.

  • The population size of the full cycle is $N$
  • The total number of issues found in the full cycle is $s$ ($s \gt 0$)
  • The number of items having a certain combination of issues is $M(I_1,I_2, ..., I_s)$ $(M(...) \ge 0)$, where $Ii$ is an indicator which corresponds to the existence of issue $i$
  • The population size of the limited cycle is $N'$
  • The sample size of the limited cycle is $a$
  • The total number of issues found in the limited cycle is $s'$
  • Find the minimal $a$ which satisfies $P(s'=s|a)\ge P_0$, or alternatively calculate $P(s'=s|a)$, where $P(...)$ is the detection probability of the limited cycle

I can calculate the detection probability for scenario 1 based on combinations. For scenarios 2 and 3, I can derive the result if $s$ is a given constant, but have yet to arrive at a solution which works for an arbitrary $s$. Do you know of a general way to solve for scenarios 2 and 3, either precisely or approximately?

Thanks and Regards,

Aquila