Upper bound of probability of not getting all values in independent draws

Question

Upper bound of probability of not getting all values in independent draws

249 Views Asked by Bumbble Comm At 24 Feb 2026 - 5:36

We draw $m$ uniform independent random values among $n$, with $m\ge n$. We consider the probability $p(m,n)$ that not all $n$ values have been drawn. We want an upper bound within a constant factor.

The probability that a given value was not drawn is $q(m,n)=(1-1/n)^m$. If these probabilities were independent, we could use $1-(1-q(m,n))^n$ for $p(m,n)$. However the probabilities are dependent and that expression yields dead wrong values.

How can we rigorously derive an upper bound? I needs not be tight, I'd be happy with something within any constant factor.

The question's motivation is to prove that $\displaystyle\lim_{n\to+\infty}p(n\left\lceil\log_2(n)\right\rceil,n)=0$, if possible with an explicit upper bound (towards answering an often-asked question about cryptographic hashes reaching their whole stated destination set, under a random oracle model).

This itself is plausible because (as studied in the coupon collector's problem) the expected number of draws to get all $n$ values is $n(\ln(n)+\gamma)+\frac12+\mathcal O(\frac1n)$, and $n\left\lceil\log_2(n)\right\rceil$ grows faster than that, by a factor $\frac1{\ln(2)}>1$.

Original Q&A

There are 3 best solutions below

Bumbble Comm On 21 Oct 2021 - 2:12

By the standard urns and balls interpretation, we have $$p(m,n)=1-\frac{n! \, S_{m,n}}{n^m}$$

where $S_{m,n}$ is the Stirling number of the second kind. Then the question is equivalent to ask for a lower bound for $S_{m,n}$. There are many, but don't expect something very neat... or practical. For example. Or this one.

Added: Still not an aswer, but another plausibility argument. The number of balls can be approximated (Poissonization) by $n$ iid Poisson variables with $\lambda=m/n$. Then the probability that all urns are occupied is

$$ 1-p(m,n) \approx \left( 1 - \exp(- \lambda)\right)^n$$

If we take $m = a \log(n) n$ and we assume that the approximation is asympotically precise, then for large $n$ we get

$$ 1-p(m,n) \approx \exp(- n^{1-a}) \to \begin{cases} 0 & a<1 \\ 1 & a>1\end{cases}$$

Bumbble Comm On 25 Oct 2021 - 11:01

Approximation of the Probability

The probability that a particular $j$ values have been missed is $$ N_j=\overbrace{\quad\binom{n}{j}\quad}^{\substack{\text{number of}\\\text{ways to choose}\\\text{the $j$ missed}}}\overbrace{\left(1-\frac jn\right)^m\vphantom{\binom{n}{j}}}^{\substack{\text{probability of}\\\text{missing those}\\\text{$j$ values}}}\tag1 $$ The Principle of Inclusion-Exclusion says that the probability of missing at least one value is $$ p(m,n)=\sum_{j=1}^n(-1)^{j-1}\binom{n}{j}\left(1-\frac jn\right)^m\tag2 $$ Therefore, $$ \begin{align} p(n\log_2(n),n) &=\sum_{j=1}^n(-1)^{j-1}\binom{n}{j}\left(1-\frac jn\right)^{n\log_2(n)}\tag{3a}\\ &\sim\sum_{j=1}^n(-1)^{j-1}\binom{n}{j}e^{-j\log_2(n)}\tag{3b}\\[3pt] &=\left(1-\left(1-e^{-\log_2(n)}\right)^n\right)\tag{3c}\\[9pt] &=\left(1-\left(1-\tfrac1{n^{1/\hspace{-1pt}\log(2)}}\right)^n\right)\tag{3d}\\[9pt] &\sim n^{1-1/\hspace{-1pt}\log(2)}\tag{3e}\\[12pt] &\to0\tag{3f} \end{align} $$ Explanation:
$\text{(3a)}$: apply $(2)$
$\text{(3b)}$: Poisson approximation
$\text{(3c)}$: Binomial Theorem
$\text{(3d)}$: $e^{-\log_2(n)}=\frac1{n^{1/\hspace{-1pt}\log(2)}}$
$\text{(3e)}$: Binomial approximation
$\text{(3f)}$: $1-1/\log(2)\approx-0.442695$

Comparison of Values $$ \begin{array}{c|c|c} n&p(n\log_2(n),n)&n^{1-1/\hspace{-1pt}\log(2)}\\\hline 10&0.275663&0.360832\\ 100&0.118840&0.130200\\ 1000&0.045682&0.046980\\ 10000&0.016798&0.016952 \end{array} $$

**Bumbble Comm** · Accepted Answer

Edit: Define the indicator variable $X_{ij}=\mathbf{1}[ball ~j ~falls~in~ bin ~i]$ with $Y_i=\sum_{j=1}^m X_{ij}$ the load in bin $i.$ This is a sum of independent $X_{ij}$'s.

The expected load of bin $i$ is $\mu=m/n.$ The lower tail of the Chernoff bound is $$ \mathbb{P}[Y_i\leq (1-\delta)\mu]< \left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^\mu. $$ Choose $\delta$ such that, say, $(1-\delta)\mu=0.99,$ (i.e., $\delta=1-\frac{0.99}{\mu}$) which means that $$ \mathbb{P}[Y_i\leq (1-\delta)\mu]=\mathbb{P}[Y_i\leq 0.99]= \mathbb{P}[Y_i=0]<\left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^\mu\quad (*). $$ Now $m=\lceil \lg n \rceil n,$ or $\mu=\lceil \lg n\rceil$ where $\lg(\cdot)$ is $\log_2(\cdot).$ If my arithmetic is correct, equation $(*)$ becomes essentially $$ \mathbb{P}[Y_i=0]< \exp\left(-\mu+0.99-0.99\ln\left(\frac{0.99}{\mu}\right)\right)\leq \exp\left(-\mu+0.99\right) $$ and ignoring the ceiling in $\lg n$ yields a looser upper bound namely $$ \mathbb{P}[Y_i=0]<\exp\left(-1.4426 \ln n+0.99\right) $$ and if $n$ is large enough to ignore the $+0.99$ term, we obtain $$ \mathbb{P}[Y_i=0]<n^{-1.4426}. $$ Now apply the union bound over the $n$ bins to get $$ \mathbb{P}[some~bin~empty]=\mathbb{P}[\exists i: Y_i=0]<n^{-0.4426} $$

Theorem: (Theorem 5.13 in Probability and Computing, Mitzenmacher and Upfal)

Let $X$ be the number of coupons observed before obtaining one of each of $n$ coupons. Then for any constant $c$ $$ \lim_{n\rightarrow \infty} \mathbb{P}[X>n \ln n+ c n]= 1-\exp(-exp(-c)). $$

Already for $c=5,$ the right hand side is as small as $0.007.$

Proof outline: Poisson approximation is correct in the limit $n\rightarrow \infty.$ So assume the number of balls in each bin is a Poisson r.v. with mean $\mu=\ln n+c$ which gives that the probability that a specific bin is empty is $$ \frac{\exp(-c)}{n} $$ and since the bin loadings are independent under the Poisson approximation the prob. that no bin is empty is $$ \left(1-\frac{\exp{-c}}{n}\right)^n \approx \exp(-\exp(-c)). $$

The proof is delicate and proceeds by splitting the event (where dependence on $n$ is suppressed in the notation) $$ {\cal E}=\{no~ bin~ is~ empty\} $$ as $$ \mathbb{P}({\cal E})=\mathbb{P}({\cal E} ~\cap ~(|X-\mu|\leq \sqrt{2m \ln n}))+ \mathbb{P}({\cal E} ~\cap ~(|X-\mu|> \sqrt{2m \ln n}))\quad (1) $$ and showing that:

$\mathbb{P}(|X-\mu|> \sqrt{2m \ln n})$ is $o(1).$
$\left| \mathbb{P}({\cal E}~|~|X-\mu|\leq \sqrt{2m \ln n})-\mathbb{P}({\cal E}~|~X=m)\right|$ is $o(1).$
guarantees that the second term on the right of $(1)$ is $o(1).$
guarantees that the difference between the experiment resulting in exactly $\mu$ vs nearly $\mu$ balls makes an asymptotically negligible difference in the probability that every bin has a ball.

Complements: Over 99.3 percent of the time the number of coupons required lies in $[n \ln n - 5n, n \ln n+ 5 n]$.

The relationship below may be of interest as well (Theorem 5.7 from the same book) for problems of this type involving balls and bins.

Theorem: Let $m$ balls be thrown uniformly independently into $n$ bins. Let $f(x_1,\ldots,x_n)$ be a nonnegative function, then $$ \mathbb{E}[f(X_1^{(m)},\ldots,X_n^{(m)}] \leq e \sqrt{m} ~\mathbb{E}[f(Y_1^{(m)},\ldots,Y_n^{(m)}], $$ where $X_i^{(m)}$ is the number of balls in the $i^{th}$ bin, for $1\leq i\leq n,$ and where $Y_i^{(m)}$ are i.i.d. Poisson with mean $m/n.$

Upper bound of probability of not getting all values in independent draws

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in COUPON-COLLECTOR

Trending Questions

Popular # Hahtags

Popular Questions