Choose $x$ objects without replacement from a bag with $n$ object.

Question

Choose $x$ objects without replacement from a bag with $n$ object.

476 Views Asked by Bumbble Comm At 02 Apr 2026 - 7:01

General problem:

Suppose there is a bag containing $n$ items with $m$ unique values $(m \leq n)$. The distribution of values across all the items is uniform. How many unique values I most probably get if I draw $x$ $(x \leq n)$ items from the bag without replacement?

My concrete problem:

In a relational engine I keep cardinality of a table as well as cardinalities (i.e. number of unique values) of individual columns of the table. When a filter predicate is applied against a particular column (e.g. MyColumn = 'value') I calculate its selectivity factor (e.g. $0.8$) and reduce the table (e.g. from $100$ rows down to $80$ rows). The cardinality of the referenced column in the result is clear ($0.8$ times the original cardinality of the column), but the problem is how to calculate cardinality of each remaining column. Again, I assume uniform distribution of values in every column.

Update:

It turns out my concrete problem is not solved sufficiently by the general question I posed above. It rather should to be described as drawing with replacement. However, I keep the question still available since there are two good answers to the general problem.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2016-05-09 23:17:59

Suppose we have $n$ items of $m$ types where $n=km.$ We draw $p$ items and ask about the expected value of the number of distinct items that appear.

First compute the total count of possible configurations. The species here is

$$\mathfrak{S}_{=m}(\mathfrak{P}_{\le k}(\mathcal{Z})).$$

This gives the EGF $$G_0(z) = \left(\sum_{q=0}^k \frac{z^q}{q!}\right)^m.$$

The count of configurations is then given by (compute this as a sanity check)

$$p! [z^p] G_0(z).$$

Note however that in order to account for the probabilities we are missing a multinomial coefficient to represent the configurations beyond $p.$ If a set of size $q$ was chosen for a certain type among the first $p$ elements that leaves $k-q$ elements to distribute. Therefore we introduce

$$G_1(z) = \left(\sum_{q=0}^k \frac{z^q}{q! (k-q)!}\right)^m.$$

The desired quantity is then given by

$$(n-p)! p! [z^p] G_1(z) = (n-p)! p! \frac{1}{(k!)^m} [z^p] \left(\sum_{q=0}^k {k\choose q} z^q\right)^m \\ = (n-p)! p! [z^p] \frac{1}{(k!)^m} (1+z)^{km} = {n\choose p} \frac{1}{(k!)^m} (n-p)! p! \\ = \frac{n!}{(k!)^m}$$

The sanity check goes through. What we have done here in this introductory section is classify all ${n\choose k,k,\ldots k}$ combinations according to some fixed value of $p,$ extracting the information of the distribution of values among the first $p$ items. We should of course get all of them when we do this, and indeed we do.

Now we need to mark zero size sets. The count of zero size sets counts the types that are not present. We get the EGF

$$H(z,u) = \left(\frac{u}{k!} + \sum_{q=1}^k \frac{z^q}{q! (k-q)!}\right)^m.$$

For the count of the number of types that are not present we thus obtain

$$(n-p)! p! [z^p] \left.\frac{\partial}{\partial u} H(z, u)\right|_{u=1}.$$

We have

$$\frac{\partial}{\partial u} H(z, u) = m \left(\frac{u}{k!} + \sum_{q=1}^k \frac{z^q}{q! (k-q)!}\right)^{m-1} \frac{1}{k!}$$

Evaluate this at $u=1$ to get

$$\frac{m}{k!} \left(\sum_{q=0}^k \frac{z^q}{q! (k-q)!}\right)^{m-1}.$$

Extracting coefficients yields

$$\frac{m}{k!} (n-p)! p! [z^p] \frac{1}{(k!)^{m-1}} \left(\sum_{q=0}^k {k\choose q} z^q\right)^{m-1} \\ = \frac{m}{k!} (n-p)! p! [z^p] \frac{1}{(k!)^{m-1}} (1+z)^{km-k} \\ = {n-k\choose p} m (n-p)! p! \frac{1}{(k!)^{m}}.$$

Therefore the expectation turns out to be

$$m - m {n-k\choose p} {n\choose p}^{-1} = m \left(1 - {n-k\choose p} {n\choose p}^{-1}\right).$$

Remark. The simplicity of this answer is evident and an elegant and straightforward probabilistic argument is sure to appear.

Remark II. For any remaining sceptics and those seeking to know more about the probability model used here I present the Maple code for this work, which includes total enumeration as well as the formula from above. Routines ordered according to efficiency and resource consumption.

with(combinat);

Q :=
proc(m, k, p)
option remember;
local n, perm, items, dist, res;

    n := m*k;

    items :=
    [seq(seq(r, q=1..k), r=1..m)];

    res := 0;

    for perm in permute(items) do
        dist :=
        convert([seq(perm[q], q=1..p)], `set`);

        res := res + nops(dist);
    od;

    res/(n!/(k!)^m);
end;


QQ :=
proc(m, k, p)
option remember;
local n, perm, items, dist, rest, res;

    n := m*k;

    items :=
    [seq(seq(r, q=1..k), r=1..m)];

    res := 0;

    for perm in choose(items, p) do
        dist := convert(perm, `set`);

        rest := p!* (n-p)!
        /mul(q[2]!*(k-q[2])!,
             q in convert(perm, `multiset`));

        rest := rest/(k!)^(m-nops(dist));

        res := res + rest*nops(dist);
    od;

    res/(n!/(k!)^m);
end;

X :=
proc(m, k, p)
local n;

    n := m*k;
    m*(1-binomial(n-k,p)/binomial(n,p));
end;

**Bumbble Comm** · Answer 2 · 2016-05-10 00:04:36

To elaborate on the discussion in the comments: Indicator variables can be very helpful for problems like these. Accordingly, let $X_i$ be the indicator variable for the $i^{th}$ value. Thus $X_i=1$ if your draw of $x$ elements gets one of value $i$, and $X_i=0$ otherwise. It is easy to compute $E[X_i]$...if $p_i$ denotes the probability that the $i^{th}$ value is drawn then we see that $E[X_i]=p_i$ and $$1-p_i=\frac {(m-1)k}{mk}\times \frac {(m-1)k-1}{mk-1}\times \cdots \times \frac {(m-1)k-(x-1)}{mk-(x-1)}=\frac {(n-k)!(n-x)!}{(n-k-x)!(n!)}$$ Where $m$ is the number of value types, $n=km$ is the total number of items, and $x$ is the number of draws.

The desired answer is then: $$E=\sum_{i=1}^mE[X_i]=m\left(1-\frac {(n-k)!(n-x)!}{(n-k-x)!(n!)}\right)$$

Note: it is not difficult to check that this matches the answer given by @MarkoRiedel (I have used $x$ for the number of draws, following the OP, instead of $p$).

Choose $x$ objects without replacement from a bag with $n$ object.

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in UNIFORM-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions