What is the probability of picking a full set from multiset after $m$ draws?

199 Views Asked by At

Suppose a bag contains $n$ balls labeled from $1$ to $n$, and suppose I have $k$ of these bags. If I open all of these $k$ bags into an urn, then the urn is effectively a multiset with $kn$ elements: $k$ balls labeled $1$, $k$ balls labeled $2$, and so on up to $n$. My question is

If I were to randomly pick out balls from the urn one by one without replacement, what's the probability of having picked out a complete set of balls labeled $1$ to $n$ after $m$ balls have been pulled out of the urn?


I'm not very familiar with probability, so I'm not sure what's the correct setup for the problem. I suspect the answer has to do with the binomial coefficients, as we're choosing $m$ elements from a set with $kn$ things, but I don't know how to account for the repetition of elements.

Any and all help is greatly appreciated. Thank you!

3

There are 3 best solutions below

0
On BEST ANSWER

Answer: The probability you are done after $m$ draws is $$ \sum_{i=0}^n (-1)^i \binom{n}i\binom{k(n-i)}{m}\Big/\binom{kn}{m} $$ Proof: This is a routine application of the principle of inclusion-exclusion. For each $i\in \{1,\dots,n\}$, let $E_i$ be the event that none of the $m$ balls you drew are numbered $i$. We want to find $$P(E_1^c\cap \dots \cap E_n^c)=1-P(E_1\cup \dots \cup E_n).$$ By PIE, this is equal to $$ \sum_{i=0}^n(-1)^{i} \sum_{1\le j(1)<j(2)<\dots <j(i)\le n} P\big(E_{j(1)}\cap \dots \cap E_{j(i)}\big) $$ By the symmetry of the problem, for all choices of $j(1),j(2),\dots,j(i)$, we have $P\big(E_{j(1)}\cap \dots \cap E_{j(i)}\big)=P(E_1\cap \dots \cap E_i) $. Therefore, the above simplified to $$ \sum_{i=0}^n (-1)^i \binom{n}i P(E_1\cap \dots \cap E_i). $$ Finally, $E_1\cap \dots \cap E_i$ is the event that no balls numbered $1$ to $i$ are drawn. The probability of this occurring is $\binom{k(n-i)}{m}\Big/ \binom{kn}m$, since the favorable outcomes are determined by choosing $m$ balls from the $k(n-i)$ which are not labeled with a number between $1$ and $i$.

0
On

While definitely not a complete solution, I have an answer if $k$ is very large (I think $k >> m$ is sufficient). This allows us to ignore the possibility that we might run out of any ball, and it also means that the probability of drawing a ball of a certain label is independently, approximately $\frac{1}{n}$.

Now, let's think of picking $m$ balls as starting with a set of the first $m$ natural numbers and then assigning each of these $m$ natural numbers a ball type (i.e. an integer from $1$ to $n$). We can biject this to picking the $m$ balls by saying that the assignment of the $k^\text{th}$ element of the set means that the $k^\text{th}$ ball we draw will be that assignment.

Since we want each of these $n$ ball types to be assigned to an element of the set at least once, we should count how many of these assignments follow this rule.

This process is equivalent to partitioning the $m$ element set into $n$ distinguishable sets of nonzero size. You may note the semblance to the definition of the Stirling numbers of the second kind. In fact, the number of partitions is $(n!)S(m,n)$.

Since each of these possible partitions, the probability of picking any arrangement of balls is $\frac{1}{n^m}$, the total probability that we get at least one of every ball is $\frac{(n!)S(m,n)}{n^m}$.

When $k$ gets closer to $m$, this expression diverges from the actual probability. I believe, the actual probability should be larger as when you take out balls, the probability of next removing a ball that has been unpicked is larger than the probability of removing a ball that has already been picked.

I also made this python code that I believe correctly computes the probability for any $n,k,m$

import numpy as np
import scipy.special as sc
import itertools
import math
from sympy.functions.combinatorial.numbers import stirling


n=3
k=60 #k possibly can be >> m
m=4 #m should be greater than or equal to n

denominator = sc.binom(n*k,m) 
#this gives the denominator of the probability fraction
#this is also the total number of possible bags if we ignore order 
#(which we will because of how itertools.combinations works)

bag = list(range(1,n+1))*k #generates a list that contains k elements of each natural number 1 to n
combos = list(itertools.combinations(bag, m)) #generates a list of all possible submultisets of size m from the bag

counter = 0
for combo in combos:
#for each of the possible drawings, if the drawing contains exactly n distinct elements
#then it will contain each type of ball. So we should include only those drawings in the count
    if len(set(combo)) == n: 
        counter += 1

print(counter/denominator)

print(stirling(m,n)*(math.factorial(n))/math.pow(n,m))
0
On

Incomplete illustration:

A complete set of balls is picked out is equivalent to the number of balls labeled with $i$ (lets simply call ball $i$) is non-zero for $i = 1, 2, \ldots n$, in the sample of size $m$

Let $E_i$ be the event that there is no ball $i$ in the sample of size $m$, $i = 1, 2, \ldots n$

The probability of the required event, the number of ball $i$ is non-zero for $i = 1, 2, \ldots n$ can be expressed as

$$ \begin{align} P\left(\bigcap_{i=1}^n E_i^c\right) &= 1 - P\left(\left(\bigcap_{i=1}^n E_i^c\right)^c\right) \\ &= 1 - P\left(\bigcup_{i=1}^n E_i \right)\end{align}$$

So when we have this union of events in hand, that's how Empy2 comments of the principle of inclusion-exclusion kicks in.

For $0 \leq m < n$, it is impossible to have a complete set of balls which has a size of $n$

For $(n-1)k < m \leq nk$, by Pigeonhole principle, the required probability is $1$ - we cannot miss a particular ball which we have $k$ copies of them in the urn.

For $n \leq m \leq (n-1)k$, using simple hypergeometric probability argument,

$$ P(E_i) = \frac {\displaystyle \binom {k} {0} \binom {(n-1)k} {m}} {\displaystyle \binom {nk} {m}}$$

$$ P(E_i \cap E_j) = \frac {\displaystyle \binom {2k} {0} \binom {(n-2)k} {m}} {\displaystyle \binom {(n-2)k} {m}}, i \neq j$$

$$ P(E_i \cap E_j \cap E_j) = \frac {\displaystyle \binom {3k} {0} \binom {(n-3)k} {m}} {\displaystyle \binom {(n-2)k} {m}}, i, j, k \text{ are distinct}$$

and so on.

Using inclusion-exclusion principle, $$ \begin{align} P\left(\bigcup_{i=1}^n E_i \right) &= \sum_{i=1}^n P(E_i) - \sum_{i \neq j}^n P(E_i \cap E_j) + \sum_{i\neq j \neq k}^n P(E_i \cap E_j \cap E_k) \\ & ~~~~ - \ldots \pm P\left(\bigcap_{i=1}^n E_i\right) \end{align}$$

Actually the last probability must be zero as we have at least one ball being chosen.

Some special cases: when $(n-2)k < m \leq (n-1)k$, we can at most miss one of the ball $i$ completely. So in such cases all $E_i \cap E_j = \varnothing$ so the probability reduces to

$$ \begin{align} 1 - \sum_{i=1}^n P(E_i) &= 1 - n \frac {\displaystyle \binom {k} {0} \binom {(n-1)k} {m}} {\displaystyle \binom {nk} {m}} \\ &= 1 - n \frac {(nk-k)! (nk-m)!} {(nk)!(nk - m - k)! } \\ &= 1 - n \prod_{j=1}^k \frac {nk - m + 1 - j} {nk + 1 - j} \\ &= 1 - n \prod_{j=1}^k \left(1 - \frac {m} {nk + 1 - j }\right) \end{align} $$

For $m$ is small, close to $n$, we can use another method to count. E.g. when $m = n$, we must require each number appear exactly once. So therefore

$$ \frac {\displaystyle \binom {k} {1}^n} {\displaystyle \binom {kn} {n} } = \frac {k^n n! (kn - n)!} {(kn)!}$$

When $m = n+1$, the configuration must be one of the number appear exactly twice where the other appear exactly once So the probability is

$$ \binom{n} {1} \frac {\displaystyle \binom {k} {1}^{n-1} \binom {k} {2}} {\displaystyle \binom {kn} {n+1} }$$

etc. So we can somehow enumerate the possible configurations and count, and it become intractable when $m$ grow.