Given G groups, and X samples per group. Probability of selecting N groups?

Question

Given G groups, and X samples per group. Probability of selecting N groups?

210 Views Asked by Bumbble Comm At 27 Mar 2026 - 10:46

We are given a collection of samples (of size $G*X$) that are partitioned in $G$ groups. Each group has $X$ samples.

For example, for $G = 3$, $X = 2$ the collection is given by: $(A), (A), (B), (B), (C), (C)$.

We draw samples from the complete collection (with, or without replacement, it doesn't matter, whatever is easier to analyze). We randomly draw $D$ samples from the collection. Let's call the number of distinct groups in our obtained sample $N$. The question is, what is the probability of observing $N$ groups in our obtained sample?

For example, for the above case, if we get to select 2 samples with replacement. The possibilities are: two samples of the same group (3x), two samples from two distinct groups (3x). Thus $P(N = 1|D = 2) = 0.5$, $P(N = 2|D = 2) = 0.5$.

Is there a general formula or expression to calculate or approximate $P(N|D)$? Clearly $P(N|D)$ has a maximum around $N \approx D$, but I'm trying to get a more accurate characterization. If it makes the analysis more convenient, $X$ could be taken to be a large.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 14 Aug 2018 - 5:33

Here is the answer with replacement: $$ \boxed{P(N)=\frac1{G^D}\binom{G}N\sum_{j=0}^{N-1} (-1)^j \binom{N}{j}(N-j)^D.} $$ Note that this does not depend on $X$.

The $\binom{G}N$ accounts for choosing which of the $N$ groups are observed. The summation uses inclusion-exclusion to count the number of surjective functions from a set of size $D$ (the samples) to a set of size $N$ (the observed groups). Namely, add up all $N^D$ such functions, then for each group subtract the $(N-1)^D$ functions which miss that group, then add back in for each pair of groups the $(N-2)^D$ functions which miss both of them, etc.

**Bumbble Comm** · Accepted Answer

When drawing without replacement and counting all possible configurations we get the combinatorial class

$$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \textsc{SEQ}_{=G}(\textsc{SET}_{\le X}(\mathcal{Z})).$$

We obtain for the count

$$D! [z^D] (1+z/1!+z^2/2!+\cdots+z^X/X!)^G.$$

Note however that there is an adjustment to make in order to attach the probabilities:

$$D! [z^D] (1+ X z/1! + X(X-1) z^2/2! +\cdots+ X(X-1)..1 z^X/X!)^G \\ = D! [z^D] (1+z)^{GX} = D! {GX\choose D}.$$

What this says is that we must divide by $(GX)^{\underline{D}}$ to add the denominator from the probabilities.

Marking sets of size zero we find

$$\textsc{SEQ}_{=G}(\mathcal{U} + \textsc{SET}_{1\le\cdot\le X} (\mathcal{Z})).$$

We require $G-N$ of these sets, getting

$$D! [z^D] [u^{G-N}] (u - 1 + (1+z)^X)^G \\ = D! [z^D] \frac{1}{(G-N)!} \left. \left(\frac{\partial}{\partial u}\right)^{G-N} (u - 1 + (1+z)^X)^G \right|_{u=0} \\ = D! [z^D] \frac{1}{(G-N)!} \left. G^{\underline{G-N}} (u -1 + (1+z)^X)^{G-(G-N)} \right|_{u=0} \\ = {G\choose G-N} D! [z^D] (-1 + (1+z)^X)^N \\ = {G\choose N} D! [z^D] \sum_{q=0}^N {N\choose q} (-1)^{N-q} (1+z)^{qX} \\ = {G\choose N} D! \sum_{q=0}^N {N\choose q} (-1)^{N-q} {qX\choose D}.$$

We thus have for the probability

$$\bbox[5px,border:2px solid #00A000]{ {GX\choose D}^{-1} {G\choose N} \sum_{q=0}^N {N\choose q} (-1)^{N-q} {qX\choose D}.}$$

We may verify this formula by enumeration, which is shown below. This routine succeeds on a considerable range of values. Further optimization is possible for example in restricting the partition iterator.

with(combinat);

ENUM :=
proc(G, X, DV, N)
    option remember;
    local res, part, psize, mset, ff, probf;

    res := 0; ff := (x, k) -> mul(x-q, q=0..k-1);

    part := firstpart(DV);

    while type(part, `list`) do
        psize := nops(part);

        if psize = N and max(part) <= X then
            probf := mul(ff(X, p), p in part);
            mset := convert(part, `multiset`);

            res := res + probf * binomial(G, psize) *
            DV!/mul(p!, p in part) *
            psize!/mul(p[2]!, p in mset);
        fi;

        part := nextpart(part);
    od;

    res/ff(G*X, DV);
end;


X :=
proc(G, X, DV, N)
    binomial(G*X,DV)^(-1)*binomial(G,N)
    *add(binomial(N,q)*(-1)^(N-q)*binomial(q*X,DV),
         q=0..N);
end;

Given G groups, and X samples per group. Probability of selecting N groups?

There are 2 best solutions below

Related Questions in COMBINATORICS

Related Questions in BINOMIAL-COEFFICIENTS

Trending Questions

Popular # Hahtags

Popular Questions