Probability of sampling at least one member of each class

Question

Probability of sampling at least one member of each class

135 Views Asked by Bumbble Comm At 02 Apr 2026 - 5:01

Suppose we have $N$ samples partitioned into $C$ classes with $N_1, ..., N_C$ the number of samples in each class. Randomly we pick up $n$ samples. What is the probability to have every class in this selection (at least one sample from each class)

For this problem, I came up with the solution presented here (https://math.stackexchange.com/users/72858/ethan-bolker)

The total number of samples of size $n$ picked up among $N$ is $\binom{N}{n}$ Then you have $N_1$ ways to choose one sample from the first class, $N_2$ from the second, ... $N_C$ from the last one. And finally $\binom{N-C}{n-c}$ ways to pick up indifferently the remaining samples.

$$\frac{N_1 N_2...N_C\binom{N-C}{n-c}}{\binom{N}{n}}$$

However with $N=10 000$, $N_1=N_2=...=N_C=1000$ and $n=100$, I get very high probabilities around $10^9$

So I thought I did a mistake and went with the complementary event "What is the probability of having at least one class that is NOT represented ?" Here we will take balanced classes $N_1=N_2=...=N_C$

There are $C$ ways to choose the class that is not represented then you have $\binom{N-N_1}{n}$ ways to pick up the $n$ samples among the $C-1$ classes that are represented (population of size $N-N_1$)

So the answer of the first problem would the be $$1- \frac{C\binom{N-N_1}{n}}{\binom{N}{n}}$$

However, once again with $N=10 000$, $N_1=N_2=...=N_C=1000$ and $n=10$ I will obtain negative or above 1 probabilities.

Could someone please tell me why these formulas are false

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Consider for example $6$ samples, $2$ of which are blue (B1 and B2), $2$ of which are green (G1 and G2), and $2$ red (R1 and R2). Then we want to pick at least one sample from each color with $n=4$

With your initial formula, the choice $\{\text{B1,B2,G1,R1}\}$ gets counted twice! Indeed, we could first pick $\{\text{B1,G1,R1}\}$, one from each color, and then pick $\{\text{B2}\}$ from the remaining samples, or we could first pick $\{\text{B2,G1,R1}\}$, and then pick $\{\text{B1}\}$ from the remaining samples.

The bigger $n$ gets, the more incorrect double counting you do, so that you incorrectly get probabilities that exceed $1$.

Your second approach is also wrong due to double counting: Let's add two samples of color yellow (Y1 and Y2) to the above example. One way to not pick at least one sample from each color is the choice $\{B1, B2, G1, G2\}$. But you count this twice! Indeed, on could first pick to ignore the red color and then choose $\{B1, B2, G1, G2\}$, or one could first pick to ignore the yellow color and then choose $\{B1, B2, G1, G2\}$. Because you subtract incorrectly doubly-counted cases, you can incorrectly end up with negative probabilities.

Probability of sampling at least one member of each class

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions