Arranging $n$ balls in $k$ bins so that $m$ consecutive bins are empty

1k Views Asked by At

This question is inspired by the following problem:

Randomly place seven balls into ten bins, with no bin containing more than one ball. What is the probability that there will be (at least) two consecutive bins that are empty?

Note that this is not a circular arrangement, although that could make for an interesting (if simpler) question. The actual question I have is the same as the one above, but with the numbers written in emphasized text generalized as positive integers $n$, $k$, and $m$, respectively.

Question:

Randomly place $n$ balls into $k$ bins, with no bin containing more than one ball. What is the probability that there will be (at least) $m$ consecutive bins that are empty?

I suspect that this is a known counting problem (as framed, or through some equivalent rephrase), so I include a reference-request tag. An original solution would, of course, be welcome as well!

As in an already included answer: Specific cases (e.g., $m=2$) would be helpful, too.

5

There are 5 best solutions below

3
On BEST ANSWER

We show the probability $p_m(k,n)$ to randomly place $n$ balls into $k$ bins with no bin containing more than one ball, so that at least $m$ consecutive bins are empty is \begin{align*} \color{blue}{p_m(k,n)=\binom{k}{n}^{-1}\sum_{j=1}^{\lfloor k/m\rfloor}\binom{k-mj}{n}\binom{n+1}{j}(-1)^{j+1}}\tag{0} \end{align*}

In order to do so we consider the binary alphabet $V=\{0,1\}$ and decode empty bins with $0$ and bins where a ball is placed with $1$.

  • We are looking for the number $g_m(k,n)$ of binary words of length $k$ which contain $n$ $1$'s and runs of $0$ with length at most $m-1$. The number of wanted words is \begin{align*} f_m(k,n)=\binom{k}{n}-g_m(k,n) \end{align*}
  • Since there is a total of $\binom{k}{n}$ binary words of length $k$ with $n$ $1$'s, the probability $p_m(k,n)$ is \begin{align*} p_m(k,n)=\binom{k}{n}^{-1}f_m(k,n) \end{align*}

Words with no consecutive equal characters at all are called Smirnov or Carlitz words. See example III.24 Smirnov words from Analytic Combinatorics by Philippe Flajolet and Robert Sedgewick for more information.

A generating function for the number of Smirnov words over a binary alphabet is given by \begin{align*} \left(1-\frac{2z}{1+z}\right)^{-1}\tag{1} \end{align*}

We replace occurrences of $0$ in a Smirnov word by one up to $m-1$ zeros to generate strings having runs of $0$ with length less than $m$. \begin{align*}\ z\longrightarrow z+z^2+\cdots+z^{m-1}=\frac{z\left(1-z^{m-1}\right)}{1-z}\tag{2} \end{align*}

Since there are no restrictions to the length of runs of $1$'s we replace occurrences of $1$'s in a Smirnov word by one or more $1$s. \begin{align*}\ z\longrightarrow z+z^2+\cdots=\frac{z}{1-z}\tag{3} \end{align*}

We substitue (2) and (3) in (1). Since we want to keep track of the number of zeros we put $tz$ instead of $z$ in (2). We obtain

\begin{align*} \left(1- \frac{\frac{tz\left(1-(tz)^{m-1}\right)}{1-tz}}{1+\frac{tz\left(1-(tz)^{m-1}\right)}{1-tz}}-\frac{\frac{z}{1-z}}{1+\frac{z}{1-z}}\right)^{-1} &=\frac{1-(tz)^m}{1-z(1+t-(tz)^{m})}\tag{4} \end{align*}

Denoting with $[z^n]$ the coefficient of $z^n$ in a series we obtain from (4) the number of wanted words as \begin{align*} \color{blue}{f_m(k,n)}&=\binom{k}{n}-g_m(k,n)\\ &\color{blue}{=\binom{k}{n}-[z^kt^{k-n}]\frac{1-(tz)^m}{1-z(1+t-(tz)^{m})})}\tag{5} \end{align*}

Example: Let's look at OPs example. We set $k=10,n=7$ and $m=2$ and we obtain with some help of Wolfram Alpha \begin{align*} \color{blue}{f_2(10,7)}&=\binom{10}{7}-[z^{10}t^3]\frac{1-(tz)^2}{1-z(1+t-(tz)^{2})}\\ &=120-[z^{10}t^3]\left(1 + (t + 1) z + (2 t + 1) z^2\right.\\ &\qquad\left. + \cdots + (6 t^5 + 35 t^4 + \color{blue}{56} t^3 + 36 t^2 + 10 t + 1) z^{10} + \cdots\right)\\ &=120-56\\ &\,\,\color{blue}{=64} \end{align*} with probability

\begin{align*} \color{blue}{p_2(10,7)}=\binom{10}{7}^{-1}f_2(10,7)=\frac{64}{120}\color{blue}{=\frac{8}{15}=0.5\dot{3}} \end{align*}

The blue colored coefficient of $z^{10}$ shows there are $\color{blue}{56}$ binary words of length $10$ with $3$ zeros and runs of $0$ with length less than $k=2$. The blue colored result $\color{blue}{64}$ is the number of valid words having runs of zeros of length at least $2$.

These $64$ words are $$ \begin{array}{cccc} \color{blue}{00}01111111&1\color{blue}{00}0111111&11\color{blue}{00}011111&111\color{blue}{00}01111\\ \color{blue}{00}10111111&1\color{blue}{00}1011111&11\color{blue}{00}101111&111\color{blue}{00}10111\\ \color{blue}{00}11011111&1\color{blue}{00}1101111&11\color{blue}{00}110111&111\color{blue}{00}11011\\ \color{blue}{00}11101111&1\color{blue}{00}1110111&11\color{blue}{00}111011&111\color{blue}{00}11101\\ \color{blue}{00}11110111&1\color{blue}{00}1111011&11\color{blue}{00}111101&111\color{blue}{00}11110\\ \color{blue}{00}11111011&1\color{blue}{00}1111101&11\color{blue}{00}111110&11101\color{blue}{00}111\\ \color{blue}{00}11111101&1\color{blue}{00}1111110&1101\color{blue}{00}1111&111011\color{blue}{00}11\\ \color{blue}{00}11111110&101\color{blue}{00}11111&11011\color{blue}{00}111&1110111\color{blue}{00}1\\ 01\color{blue}{00}111111&1011\color{blue}{00}1111&110111\color{blue}{00}11&11101111\color{blue}{00}\\ 011\color{blue}{00}11111&10111\color{blue}{00}111&1101111\color{blue}{00}1&\\ 0111\color{blue}{00}1111&101111\color{blue}{00}11&11011111\color{blue}{00}&\\ 01111\color{blue}{00}111&1011111\color{blue}{00}1&&\\ 011111\color{blue}{00}11&10111111\color{blue}{00}&&\\ 0111111\color{blue}{00}1&&&\\ 01111111\color{blue}{00}&&&\\ \\ 1111\color{blue}{00}0111&11111\color{blue}{00}011&111111\color{blue}{00}01&1111111\color{blue}{00}0\\ 1111\color{blue}{00}1011&11111\color{blue}{00}101&111111\color{blue}{00}10\\ 1111\color{blue}{00}1101&11111\color{blue}{00}110&11111101\color{blue}{00}\\ 1111\color{blue}{00}1110&1111101\color{blue}{00}1&\\ 111101\color{blue}{00}11&11111011\color{blue}{00}&\\ 1111011\color{blue}{00}1&&\\ 11110111\color{blue}{00}&&\\ &&\\ &&\\ \end{array} $$

Formula of $f_m(k,n)$:

We derive a formula of $f_m(k,n)$ from (5) manually. It is convenient to use the coefficient of operator $[z^k]$ to denote the coefficient of $z^k$ in a series.

We obtain from (5) \begin{align*} \color{blue}{f_m(k,n)}&=\binom{k}{n}-[z^kt^{k-n}]\frac{1-(tz)^m}{1-z(1+t-(tz)^m}\\ &=\binom{k}{n}-[z^kt^{k-n}]\sum_{j=0}^\infty z^j\left(1+t-(tz)^{m}\right)^j\left(1-(tz)^m\right)\tag{6}\\ &=\binom{k}{n}-[t^{k-n}]\sum_{j=0}^k [z^{k-j}]\left(1+t-(tz)^{m}\right)^j\left(1-(tz)^m\right)\tag{7}\\ &=\binom{k}{n}-[t^{k-n}]\sum_{j=0}^k [z^j]\left(1+t-(tz)^{m}\right)^{k-j}\left(1-(tz)^m\right)\tag{8}\\ &=-[t^{k-n}]\sum_{j=1}^{\lfloor k/m\rfloor}[z^{mj}]\left(1+t-(tz)^{m}\right)^{k-mj}\left(1-(tz)^m\right)\tag{9}\\ &=-[t^{k-n}]\sum_{j=1}^{\lfloor k/m\rfloor}[z^{mj}]\sum_{l=0}^{k-mj}\binom{k-mj}{l}\left(1-(tz)^m\right)^{l+1}t^{k-mj-l}\tag{10}\\ &=-[t^{k-n}]\sum_{j=1}^{\lfloor k/m\rfloor}[z^{mj}]\sum_{l=0}^{k-mj}\binom{k-mj}{l}\sum_{u=0}^{l+1}\binom{l+1}{u}(-1)^{u}(tz)^{mu}t^{k-mj-l}\tag{11}\\ &=[t^{k-n}]\sum_{j=1}^{\lfloor k/m\rfloor}\sum_{l=0}^{k-mj}\binom{k-mj}{l}\binom{l+1}{j}(-1)^{j+1}t^{k-l}\tag{12}\\ &\,\,\color{blue}{=\sum_{j=1}^{\lfloor k/m\rfloor}\binom{k-mj}{n}\binom{n+1}{j}(-1)^{j+1}}\tag{13}\\ \end{align*} with probability \begin{align*} \color{blue}{p_m(k,n)=\binom{k}{n}^{-1}f_m(k,n)} \end{align*} in accordance with (0).

Comment:

  • In (6) we do a geometric series expansion.

  • In (7) we apply the rule $[z^{p-q}]A(z)=[z^p]z^qA(z)$. We also set the upper limit to $k$, since other terms do not contribute.

  • In (8) we change the order of summation $j\to k-j$.

  • In (9) we sum over multiples of $m$ only by setting $j\to mj$ since we have $z^{m}$ in the binomial expansion. We also note that $\binom{k}{n}$ is canceled with the first summand $j=0$ since $[z^0t^{k-n}](1+t-(tz)^m)^k(1-(tz)^m)=[t^{k-n}](1+t)^k=\binom{k}{k-n}=\binom{k}{n}$.

  • In (10) we expand the binomial.

  • In (11) we expand the binomial.

  • In (12) we select the coefficient of $z^{mj}$ by selecting the summand with $u=j$.

  • In (13) we select the coefficient of $t^{k-n}$ by selecting the summand with $l=n$.

Example revisited:

By calculating OPs example ($k=10,n=7,m=2$) now with formula (13) we obtain \begin{align*} \color{blue}{f_2(10,7)}&=\sum_{j=1}^5\binom{10-2j}{7}\binom{8}{j}(-1)^{j+1}\\ &=\binom{8}{7}\binom{8}{1}\\ &\,\,\color{blue}{=64} \end{align*} in accordance with the result above.

Note that here we only take the summand $j=1$, since other terms have lower index $7$ greater than the upper index $10-2j$, so that the binomial coefficients are zero.

7
On

Denote the sizes of the runs of empty bins as $x_1, \ldots, x_{n + 1}$. (Note that there are at most this many runs, so runs may be empty; i.e., $x_j = 0$.) Now, count the number of nonnegative integer solutions to

$$ x_1 + \cdots + x_{n + 1} = k - n $$

that have at least one $x_j \ge m$. This is equivalent to solving the problem.

To count this quantity, use the facts that:

  1. The number of nonnegative integer solutions to $$ x_1 + \cdots + x_k = n $$ is $$ \binom{n + k - 1}{k - 1}\text. $$ This is a simple stars and bars argument.
  2. The number of nonnegative integer solutions to $$ x_1 + \cdots + x_k = n $$ with each $x_j < c$ is $$ \binom{n + k - 1}{k - 1} - \sum_{i = 1}^{\lfloor n / {c} \rfloor} {(-1)}^{i + 1} \binom{k}{i}\binom{n - c i + k - 1}{k - 1}\text.\tag{*}\label{*} $$ To show (\ref{*}), recall the complementary form of the inclusion–exclusion principle. Namely, we can count the number of nonnegative integer solutions with each $x_j < c$ as $$ \begin{align} && \text{the number of all (unrestricted) solutions} \\ &-& \text{the number of solutions with one } x_j \geq c \\ &+& \text{the number of solutions with two } x_j \geq c \\ &-& \cdots \\ &{(-1)}^{\lfloor n / {c} \rfloor}& \text{the number of solutions with }\lfloor n / {c} \rfloor \text{ } x_j \geq c\text. \end{align} $$ Now, how do you count the number of solutions that have $i$-many $x_j \geq c$? It's quite simple; write those $x_j$ as $c + y_j$ (since we already know they're greater than or equal to $c$), and you already have a reduction to the stars and bars argument, but with $n' = n - c i$. This tells us that the number of solutions with $i$-many $x_j \geq c$ is $\binom{n - c i + k - 1}{k - 1}$. The $\binom{k}{i}$ factor counts the number of ways to have this scenario.
0
On

There's a nice way to do this if $m = 2.$ I describe it here, although it does not solve the more general problem indicated in the question.

We can get a classic "random" arrangement if the bins are identifiable and the balls are not distinguishable, so I'll make those assumptions.

In each bin we either place one ball or no balls. We choose $n$ of the bins in which to place the $n$ balls, leaving $n - k$ empty. The number of ways to do this is $$\binom k{k-n} = \binom kn,$$ each way equally probable.

Another way to think about those same arrangements of balls and bins is that we have $k - n$ empty bins, with a run of zero or more full bins before the first empty bin, after the last empty bin, and between each consecutive pair of empty bins. That is, we are putting $n$ balls into $k - n + 1$ "super bins", where each "super bin" is a run of zero or more filled bins.

As a check on the last interpretation of the problem, the number of ways to put zero or more indistinguishable items into each of $k - n + 1$ containers, placing a total of $n$ items, is $$ \binom{n + (k - n + 1) - 1}{(k - n + 1) - 1} = \binom k{k - n}.$$

So that is the number of equally probable events in our sample space. Now let's count the number of these events that do not have two consecutive empty bins. That means that the $k - n - 1$ runs of full bins that lie between consecutive pairs of empty bins must each include at least one full bin. Having determined that we need $k - n - 1$ balls placed in this way, we see that there are no such arrangements if $n < k - n - 1,$ that is, if $2n - k + 1 < 0.$ On the other hand, if $n + 1\geq k$ there are not even two empty bins anywhere and so it is guaranteed that we will not have two adjacent empty bins. But if $2n - k + 1 \geq 0$ and $n + 1 < k,$ then the number of ways to place the remaining $2n - k + 1$ full bins into the $k - n + 1$ runs (including the ones on each end) is $$ \binom{(2n - k + 1) + (k - n + 1) - 1}{(k - n + 1) - 1} = \binom{n + 1}{k - n}. $$

So the probability of two consecutive empty bins is $0$ if $n + 1 \geq k$; if $2n - k + 1 < 0$ the probability is $1$; and in all other cases the probability is $$ 1 - \frac{\displaystyle\binom{n + 1}{k - n}}{\displaystyle\binom k{k - n}} = 1 - \frac{(n + 1)! \,n!}{(2n - k + 1)!\, k!}. $$

5
On

There is the possibility to calculate numerically the result with a rather simple recursive equation.

We will assume $m = 2$ in the following.
To simplify the notation, we will consider that $0$ represents the case "no ball" and $1$ the case "with a ball".
We note $f(k, n)$ the probability associated with $k$ and $n$.
We represent the set of possibilities as a graph.
We consider three cases for the first bins:

  • 1 in the first bin. Probability $\frac{n}{k}$. The subgraph corresponds to $f(k-1, n-1)$
  • 01 in the first 2 bins: Probability $\frac{k-n}{k} \times \frac{n}{k-1}$. The subgraph corresponds to $f(k-2, n-1)$
  • 00 in the first 2 bins: Probability $\frac{k-n}{k} \times \frac{k-1-n}{k-1}$ -> we get the $m$ holes.

Then, we obtain, for $m = 2$

$$f(k,n) = \frac{n}{k} f(k-1, n-1) + \frac{n(k-n)}{k(k-1)} f(k-2, n-1) + \frac{(k-n)(k-1-n)}{k(k-1)}$$

We note in addition that $f() = 0$ if $k-n \le m-1$ and that $f() = 1$ if $n=0$ and $k \ge m$

This recursive formula can be generalised for higher values of $m$, although it becomes more difficult.

This is clearly not as satisfactory as a close form formula. However, it can be represented on a graph rather easily, and it is easy to program.
It may be used for secondary school students.

Note: the program gives $f(10, 7, 2) = \frac{8}{15}$

EDIT:
When I arrived at this point, I suspected something ...
Is there a much simpler solution ?
I thought having found a simple formula, it was a mistake ...

EDIT 2: C++ programme for $m = 2$

#include <iostream>
#include <iomanip>

double f2(int k, int n) {       // m = 2
    if ((k - n) <= 1) return 0.0;
    if (n == 0) return 1.0;
    double res = double ((k-n)*(k-1-n)) / (k*(k-1))
        + (double (n*(k-n)) / (k*(k-1))) * f2(k-2, n-1)
        + (double (n)/k) * f2(k-1, n-1);
    return res;
}

int main() {
    int k = 10;
    int n = 7;
    double prob = f2(k, n);
    std::cout << "f = " << std::setprecision(12) << prob << "\n";
    return 0;
}

EDIT 3: Display of a part of the graph.
"0:3/10" means that the line above corresponds to "0" (no ball), and that the corresponding probability is equal to 3/10. SB means "subgraph", and the near figure corresponds to the probability associated to this subgraph. This also illustrates the low efficiency of the scheme in terms of computation. However, again, it was not the goal of this proposal.

                      f(10,7)=8/15
                        /      \
                       /        \
                      /          \
                   0:3/10       1:7/10
                  SB:1/8        SB:f(9,6)=7/12                     
                  /    \               |     \
                 /      \              |      \
                /        \             |       \
             0:2/9      1:7/9         0:1/3     1:2/3
            SB:1.0   SB:f(8,6)=1/4   SB:13/28   SB:f(8,5)=9/14
                        /     \        |   \        |       \
                       /       \       |    \       |        \
                                       |     \
                                     0:1/4    1:3/4
                                     SB:1.0   SB:f(7,5)=2/7
                                                  /     \
                                                 /       \ 
1
On

This is supposed to be an integration to d125q's answer, as to better explain how to reach to the formula he gave, but it is too long for a comment.

You are placing $0$ or $1$ ball in every bin.
We are clearly speaking of undistiguishable balls.
While, speaking of consecutive bins, that implies that the bins be distinguishable (i.e. labelled, i.e. aligned in a row).

Then your problem is equivalent to :

given all the binary strings of length $k$, with $n$ ones (and $n-k$ zeros), how many of them will contain one or more runs of consecutive zeros whose length is $m$ or greater ?

The number of binary strings of length $k$ and with $n$ ones is $\binom{k}{n}$.
So the complement of the above will be the number of strings where the length of each run of consecutive zeros is less than $m$, and naturally, in a binary string we can exchange the zeros with the ones.

Now, to keep congruence with the precedent posts I am going to cite, let me change the formulation and the parameters of the problem with the following:

Consider a binary string with $s$ "$1$"'s and $m$ "$0$"'s in total. Let's put an additional (dummy) fixed $0$ at the start and at the end of the string. We individuate as a run the consecutive $1$'s between two zeros, thereby including runs of null length. With this scheme we have a fixed number of $m+1$ runs.
(refer also to the similar post).

Bernoulli_runs_1

Then, imposing that each run does not contain more than $r$ ones is equivalent to compute $$N_{\,b} (s,r,m+1) = \text{No}\text{. of solutions to}\;\left\{ \begin{gathered} 0 \leqslant \text{integer }x_{\,j} \leqslant r \hfill \\ x_{\,1} + x_{\,2} + \cdots + x_{\,m+1} = s \hfill \\ \end{gathered} \right.$$ which as explained in this other post is expressed as $$ N_b (s,r,m + 1)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s} {r}\, \leqslant \,m + 1} \right)} {\left( { - 1} \right)^k \left( \begin{gathered} m + 1 \\ k \\ \end{gathered} \right)\left( \begin{gathered} s + m - k\left( {r + 1} \right) \\ s - k\left( {r + 1} \right) \\ \end{gathered} \right)} $$ Note that by rewriting the binomial in the way shown above render the summation bounds implicit in the binomial, so that they can be omitted (that's why they are indicated in brackets), which is of great advantage in performing algebraic manipulations.

I think that an effective way to understand the above is by developing the polynomial $$ \eqalign{ & \underbrace {\left( {1 + x + x^{\,2} + \cdots + x^{\,r} } \right) \cdot \left( {1 + \cdots + x^{\,r} } \right) \cdot \; \cdots \; \cdot \left( {1 + \cdots + x^{\,r} } \right)}_{m\;times} = \cr & = \left( {{{1 - x^{\,r + 1} } \over {1 - x}}} \right) = \sum\limits_{0\, \le \,\,s\,\,\, \le \,m\,r} {N_b (s,r,m)\;x^{\,s} } \cr} $$ thereby having the o.g.f. in $s$, and from there easily getting the recursion $$ N_{\,b} (s,r,m + 1) = \sum\limits_{i\, = \,0}^r {N_{\,b} (s - i,r,m)} $$

Finally note that the $N_b$ are called "r-nomial coefficients" in OEIS: see e.g. Table A008287.