Ways to get $<= n$ distinct items across $m$ samples of size $d$ from a set of size $s$

Question

Ways to get $<= n$ distinct items across $m$ samples of size $d$ from a set of size $s$

64 Views Asked by Bumbble Comm At 18 Apr 2026 - 10:37

Disclaimer: It's hard to state this question concisely, so don't confuse it with simpler selections questions. OTOH, I'm sure it's been asked, I just can't find it because I can't guess how someone else would phrase it.

Question:

Suppose I have a palette of $s$ distinct colors and have $m$ people independently choose their favorite $d$ colors from that set.

How many ways are there for them to choose $n$ or fewer distinct colors across their selections?
Is their a short name (or description) for this kind of selection / problem?

Motivation:

This is interesting because if 100 people chose their top 5 colors from a palette of 34 colors, it would be pretty surprising if they only came up with 6 distinct colors and as a group "neglected" the other 28. I am using this to determine a one-sided p-value for them having $n$ or fewer (i.e. very similar) selections.

Clarifications:

Each person chooses $d$ distinct colors (so each samples without replacement).
However, all the colors are available to everyone, so different people can include some or all the same colors in their selections.

My solution so far:

Lower Bound: $${{n \choose d}^m}$$ Upper Bound: $${s \choose n}{{n \choose d}^m}$$

The exact answer is in between these.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-06-09 23:00:15

The question of getting exactly $n$ different colors points us to generalized Stirling numbers and hence may be computed by inclusion-exclusion the same as ordinary Stirling numbers. Supposing that we have chosen the $n$ colors from the $s$ available ones we classify choices according to the number $q$ of colors that are missing and obtain by PIE

$${s\choose n} \sum_{q=0}^n {n\choose q} (-1)^q {n-q\choose d}^m.$$

This yields for the probability of at most $n$ colors appearing

$$\bbox[5px,border:2px solid #00A000]{ {s\choose d}^{-m} \sum_{p=1}^n {s\choose p} \sum_{q=0}^p {p\choose q} (-1)^q {p-q\choose d}^m.}$$

I verified this formula using the Maple code included below.

with(combinat);

ENUM :=
proc(s, d, m, n)
option remember;
local src, res, part, psize, mset,
    admit, recurse;

    src := choose({seq(q, q=1..s)}, d);

    admit := [seq(0,  q=1..m)];

    recurse :=
    proc(sofar, srcpos, count, incl)
    local seen;

        if incl then
            admit[count] := admit[count] + 1;
        fi;

        if srcpos > nops(src) or count = m then
            return;
        fi;

        recurse(sofar, srcpos+1, count, false);

        seen := sofar union src[srcpos];
        if nops(seen) <= n then
            recurse(seen, srcpos+1, count+1, true);
        fi;
    end;

    recurse({}, 1, 0, false);

    res := 0;

    part := firstpart(m);

    while type(part, `list`) do
        psize := nops(part);
        mset := convert(part, `multiset`);

        res := res + admit[psize] *
        m!/mul(p!, p in part) *
        psize!/mul(p[2]!, p in mset);

        part := nextpart(part);
    od;

    res*binomial(s,d)^(-m);
end;


X :=
(s, d, m, n) ->
binomial(s,d)^(-m) *
add(binomial(s,p) *
    add(binomial(p,q)*(-1)^q*binomial(p-q,d)^m,
        q=0..p), p=1..n);

Addendum. We should get probability one when we have $n=s.$ To prove this we write

$${s\choose d}^{-m} \sum_{p=0}^s {s\choose p} \sum_{q=0}^p {p\choose q} (-1)^{p-q} {q\choose d}^m \\ = {s\choose d}^{-m} \sum_{q=0}^s {q\choose d}^m (-1)^q \sum_{p=q}^s {s\choose p} {p\choose q} (-1)^{p}.$$

Observe that

$${s\choose p} {p\choose q} = \frac{s!}{(s-p)! \times q! \times (p-q)!} = {s\choose q} {s-q\choose s-p}$$

so this is

$${s\choose d}^{-m} \sum_{q=0}^s {q\choose d}^m (-1)^q {s\choose q} \sum_{p=q}^s {s-q\choose s-p} (-1)^{p} \\ = {s\choose d}^{-m} \sum_{q=0}^s {q\choose d}^m (-1)^q {s\choose q} \sum_{p=0}^{s-q} {s-q\choose s-q-p} (-1)^{p+q} \\ = {s\choose d}^{-m} \sum_{q=0}^s {q\choose d}^m {s\choose q} \sum_{p=0}^{s-q} {s-q\choose p} (-1)^p.$$

Evaluating the inner sum we now find

$${s\choose d}^{-m} \sum_{q=0}^s {q\choose d}^m {s\choose q} \times [[q=s]] = {s\choose d}^{-m} {s\choose d}^m {s\choose s} = 1$$

and the sanity check goes through.

Remark. The details of the Stirling number argument are in the comments to MSE 2791477.

Ways to get $<= n$ distinct items across $m$ samples of size $d$ from a set of size $s$

Question:

Motivation:

Clarifications:

My solution so far:

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Trending Questions

Popular # Hahtags

Popular Questions