Sequence of samplings from a fixed population

56 Views Asked by At

I have a population of size $N=100$ of which $f=33$ of them are bad.

I'd like to create $k=5$ subpopulations of size $m=20$ out of the given population that satisfies the following conditions.

  • There's at least a single subpopulation that has $t=15$ or more good elements.
  • All the subpopulations have strictly less than $t=15$ bad elements.

What's the probability of obtaining such subpopulations?

I can see this problem is related with hypergeometric distribution, however, since each sampling affects the other, I can't seem to apply hypergeometric distribution independently to each sampling to get the probability and then multiply them.

In short, is there a nice formula to either compute or approximate the probability I want?

1

There are 1 best solutions below

3
On BEST ANSWER

Unfortunately, I have not been able to formulate a complete solution, but it is possible to compute this probability using Mathematica:

Total[(Times @@ Binomial[20, #]) Length[Permutations[#]] & /@
   Select[IntegerPartitions[33, {5}, Range[15] - 1], Min[#] <= 5 &]] /
   Binomial[100, 33]

yields the result

$$\frac{2831739961878449752952}{3041201517260483945991} \approx 0.931125.$$

The solution involves computing the nonnegative integer partitions of $33$ into exactly $5$ terms, from the set $\{0, \ldots, 14\}$. These represent ways to distribute the $33$ "bad" elements into $5$ sub-populations such that no sub-population contains $15$ or more bad elements (the inclusion of $0$ means we allow sub-populations with no bad elements). Then from this list, we select those partitions for which there is at least one sub-population with at least $15$ good elements, which is equivalent to saying that the sub-population with the fewest bad elements has no more than $5$. This gives a list of ordered partitions. Then for each such partition, we calculate the number of distinct permutations of its terms, and compute the number of ways to insert "good" elements into each sub-population to make $5$ groups of $20$. Multiplying these together for each partition of $33$ and then taking their sum over all such partitions gives the total number of arrangements of ordered $100$-tuples from $\{0,1\}^{100}$ that satisfy the given criteria, and $\binom{100}{33}$ enumerates the total number of unrestricted $100$-tuples.

Then, we can perform a simulation to verify this result:

a = IntegerDigits[2^33 - 1, 2, 100];
Tally[ParallelTable[(Min[#] <= 5 && Max[#] <= 14) &[Total /@ 
   Partition[RandomSample[a], 20]], {10^7}]]

This gave me the output {{True, 9311285}, {False, 688715}} which for $10^7$ simulations is consistent with the probability computed above. a is simply a list of $67$ zeroes and $33$ ones. Then RandomSample creates a random permutation of this list, and Partition splits this list into $5$ groups of $20$. Then Total /@ counts the number of ones (the 'bad' elements) in each group, and we test to see that there is at least one group with $5$ or fewer bad elements, and that no group has $15$ or more bad elements. If these criteria are met, the result is True; otherwise False. Then ParallelTable performs this computation $10^7$ times, and Tally counts up the true/false frequencies.