Expected number of elements for the first of $n$ many hash tables to be filled?

Question

Expected number of elements for the first of $n$ many hash tables to be filled?

153 Views Asked by Bumbble Comm At 11 Apr 2026 - 7:02

This is a generalization of the questions: question 1 and question 2.

There are $n$ many hash tables each of size $m$. Each turn a random element in one of the hash tables is filled. If the element selected for that turn has already been filled then we've wasted that turn, and we go to the next turn as usual. What is the expected number of turns needed for one of the hash tables (whichever is first) to be filled?

Another way of asking this problem is: If we have an $n*m$ - sided die and we partition the possible values into $n$ equal sized sets of $m$ elements, what is the expected number of rolls before we complete a set?

For example, if we have a 15-sided die and we split the numbers into 3 sets of 5: $\{1,2,3,4,5\}, \{6,7,8,9,10\}$, and $\{11,12,13,14,15\}$, then $m = 5$ and $n=3$. On average, how many times must we roll the die before we've seen all $5$ rolled values for one of the sets (whichever is first)?

In question 1 WhatsUp gave a recursive equation for a two hash table scenario: $$E(a, b) = 1 + \frac a {2m}E(a - 1, b) + \frac b {2m} E(a, b - 1) + \left(1 - \frac{a + b}{2m}\right)E(a, b)$$ with $E(a,b)$ defined as the expected number of elements to fill one table, if there are still $a$ and $b$ empty entries in the two tables. Can this be modified for a 3 hash table scenario as: $$E(a, b, c) = 1 + \frac a {3m}E(a - 1, b, c) + \frac b {3m} E(a, b - 1, c) + \frac c {3m} E(a, b, c - 1) + \left(1 - \frac{a + b + c}{3m}\right)E(a, b, c)$$ and if so how do we go about solving for $E(m,m,m)$? Could we generalize this equation to deal with $n$ many hash tables, and if so how?

In question 2 there are some great answers involving Markov chains, but if $n$ and $m$ get very large, the transition matrix gets very large and impractical to fill out or work with. (At least that's how it seems to me. Are there are tricks to deal with this?) So another approach is necessary. Thomas Andrews gave a very detailed answer which included a generalized solution which should work here $$\sum_{i_1}^{a_1}\cdots\sum_{i_n=1}^{a_n}(-1)^{\sum (i_j-1)}\binom{a_1}{i_1}\cdots\binom{a_n}{i_n}\frac{a}{\sum i_j}$$ but isn't intuitive and I'm struggling to see how it came about. He mentioned, and I can see, that it stems from an inclusion-exclusion argument, but I'm failing to see how to start there and end up with his equation.

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2023-03-15 22:34:23

Answer: The expected time until the first filling of any of the hash tables is $$ nm\sum_{k=1}^n (-1)^{k-1} \binom{n}{k}H_{km},\tag1 $$ where $H_r=\frac11+\frac12+\dots+\frac1{r}$ is the $r^\text{th}$ Harmonic number. We can also give a generalization for when the hash tables have unequal sizes. If there are $n$ tables, with sizes $m_1,\dots,m_n$, then the expected time is $$ (m_1+\dots+m_n)\times \!\!\!\!\!\!\!\!\!\sum_{\varnothing \neq S\subseteq \{1,\dots,n\}}(-1)^{|S|-1}H_{m_S},\tag2 $$ where $m_S$ is shorthand for $\sum_{i\in S}m_i$. The sum ranges over all nonempty subsets of $\{1,\dots,n\}$.

Proof: Let $T_1,T_2,\dots,T_n$ be random variables, where $T_i$ is the time it takes to fill the $i^\text{th}$ hash table. We want to find the expected value of $\min(T_1,\dots,T_n)$. We start with the following algebraic relation between $\min$ and $\max$. For any real numbers $x_1,\dots,x_n$, $$ \min(x_1,\dots,x_n)=\sum_{\varnothing \neq S\subseteq \{1,\dots,n\}} (-1)^{|S|-1}\max_{i\in S}x_{i}.\tag3 $$ This can be proven by induction on $n$, using the relation $\min(x_1,x_2)=x_1+x_2-\max(x_1,x_2)$.

We apply $(3)$ with $x_i\gets T_i$, and then take the expected value of both sides: $$ E[\min(T_1,\dots,T_n)]=\sum_{\varnothing \neq S\subseteq \{1,\dots,n\}} (-1)^{|S|-1}E[\,\max_{i\in S} T_i\,].\tag4 $$ Note that $\max_{i\in S} T_i$ is the time it takes to fill all of the hash tables whose indices are in $S$. Letting $m_S=\sum_{i\in S}m_i$, I claim that the expected time it takes to do this is $$ E[\,\max_{i\in S} T_i\,]=(m_1+\dots+m_n)H_{m_S}\tag5 $$ To see this, define a sample to be useful if it is drawn from one of the hash tables in $S$, and wasted otherwise. If we ignore all of the wasted samples, then this is exactly like the coupon collector's problem where we try to collect all of the coupons in the union of the hash tables in $S$. This means that $$ E[\text{# useful samples}]=m_S\cdot H_{m_S}\tag6 $$ To account for wasted samples, note that the expected time it takes to get the next useful sample is $\frac{m_1+\dots+m_n}{m_S}$, because the time is geometrically distributed with probability of success equal to $\frac{m_S}{m_1+\dots+m_n}$. It follows that $$ E[\,\max_{i\in S} T_i\,]=\frac{m_1+\dots+m_n}{m_S}\cdot E[\text{# useful samples}]\tag7 $$ Combining $(6)$ and $(7)$ proves $(5)$, then combining $(4)$ and $(5)$ proves $(2)$. Finally, $(1)$ is a special case of $(2)$ when $m_1=\dots=m_n=m$.

**Bumbble Comm** · Answer 2 · 2023-03-15 23:32:31

I find that a useful way to think about this kind of problem is in terms of the relation

$$ E[T]=\sum_{t=0}^\infty P(T\gt t) $$

for random variables that take non-negative integer values. This allows inclusion–exclusion arguments for probabilities to be directly translated into corresponding equations for the expectation values.

Let $T$ be the number of turns required to fill one of the hash tables. By inclusion–exclusion, the probability that at least one of the $n$ hash tables is filled after $t$ turns is

$$ P(T\le t)=\sum_{k=0}^n(-1)^k\binom nkP_k(T_k\le t)\;, $$

where $P_k(T_k\le t)$ is the probability that $k$ particular hash tables are filled after $t$ turns. If we replace each probability by $1$ minus its complement, all the $1$s on the right cancel, and the $1$ on the left cancels the $k=0$ term, so

$$ P(T\gt t)=\sum_{k=1}^n(-1)^{k-1}\binom nkP_k(T_k\gt t)\;. $$

Now summing over $t$ and using the relation mentioned at the top yields

$$ E[T]=\sum_{k=1}^n(-1)^{k-1}\binom nkE[T_k]\;. $$

But $E[T_k]$ is just a scaled expectation in a standard coupon collector’s problem: The $k$ particular tables are hit a fraction $\frac kn$ of the turns, and the expected number of hits required to fill them is $kmH_{km}$, so the expected number of turns required is $\frac nk\cdot kmH_{km}=nmH_{km}$. Thus, the expected number of turns to fill at least one of the tables is

$$ E[T]=nm\sum_{k=1}^n(-1)^{k-1}\binom nkH_{km}\;. $$

For the case $n=2$ treated in the other questions, we recover the result

\begin{eqnarray} E[T] &=& 2m\sum_{k=1}^2(-1)^{k-1}\binom 2kH_{km} \\[8pt] &=& 4mH_m-2mH_{2m}\;. \end{eqnarray}

**Bumbble Comm** · Answer 3 · 2023-03-18 14:46:30

For an $mn$-sided die with faces in $n$ groups of $m$ faces each, the exponential generating function for the probability of having $j$ complete groups on the $k$th roll is $$s_j(x) = \binom{n}{j} (e^{x/mn}-1)^{jm} (e^{x/mn})^{(n-j)}$$ for $0 \le j \le n$. By inclusion/exclusion, the EGF for the probability of having no complete group on the $k$th roll is $$f(x) = \sum_{j=0}^n (-1)^j s_j(x) = [e^{x/n} - (e^{x/mn}-1)^m]^n$$ This is also the EGF for $P(T>k)$, where $T$ is the first roll on which we have a complete group. So $$\begin{align} E(T) &= \sum_{k=0}^{\infty} P(T>k) \\ &= \int_0^{\infty} e^{-x} f(x) \; dx \tag{1}\\ &= \int_0^{\infty} e^{-x} [e^{x/n} - (e^{x/mn}-1)^m]^n \; dx \\ &= \int_0^{\infty} [1-(1-e^{-x/mn})^m]^n \;dx \\ \end{align}$$ It will simplify notation a bit if we make the substitution $t = e^{-x/mn}$, with result $$\begin{align} E(T) &= mn \int_0^1 \frac{1}{t} \cdot [1-(1-t)^m]^n \; dt \\ &= mn \int_0^1 \frac{1}{t} \cdot \sum_{i=0}^n (-1)^i \binom{n}{i} \sum_{j=0}^{im} (-1)^j \binom{im}{j} t^j \;dt \tag{2} \\ &= mn \int_0^1 \frac{1}{t} \cdot \sum_{i=0}^n (-1)^i \binom{n}{i} \sum_{j=1}^{im} (-1)^j \binom{im}{j} t^j \;dt \tag{3} \\ &= mn \int_0^1 \sum_{i=0}^n (-1)^i \binom{n}{i} \sum_{j=1}^{im} (-1)^j \binom{im}{j} t^{j-1} \;dt \\ &= mn \sum_{i=0}^n (-1)^i \binom{n}{i} \sum_{j=1}^{im} (-1)^j \binom{im}{j} \frac{1}{j} \end{align}$$ Given the wealth of identities on binomial coefficients, it's likely this result can be put in a more elegant form, but I'm not seeing how to do so at present.

Notes:

$(1)$ Making use of the identity $\int_0^{\infty} x^k e^{-x} \;dx = k!$ to sum the coefficients of the EGF

$(2)$ Applying the Binomial Theorem twice

$(3)$ The terms of the summation with $j=0$ sum to zero since $\sum_{i=0}^n (-1)^i \binom{n}{i} = 0$. This is an important simplification because otherwise we would encounter a divergent integral in the next step.

Edit added 3 Mar 2023:

Making use of the identity $$- \sum_{k=1}^n (-1)^k \binom{n}{k} \frac{1}{k} = H_n$$ where $H_n$ is the $n$th harmonic number, $H_n = \sum_{j=1}^n 1/j$, shows that the result above can be put in the form $$E(T) = mn \sum_{i=0}^n (-1)^{i-1} \binom{n}{i} H_{im}$$ A proof of the identity can be found in the Wikipedia article on harmonic numbers: Harmonic number: Calculation.

Expected number of elements for the first of $n$ many hash tables to be filled?

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in DISCRETE-MATHEMATICS

Related Questions in INCLUSION-EXCLUSION

Related Questions in COUPON-COLLECTOR

Trending Questions

Popular # Hahtags

Popular Questions