Probability that an exact allocation of balls into bins happens?

181 Views Asked by At

Say I have an infinite supply of bins, $1, 2 ,3 ...$, each having the same capacity for $c$ balls.

A process produces a random amount of $b$ balls, $min<=b<=max$ per iteration. All possible values of $b$ being equiprobable, each iteration independent.

As balls are received at each iteration, they are placed into the lowest numbered bin with remaining capacity, any excess over bin capacity going into the next highest numbered bin.

Sort of like batches of water filling one jug, which can overflow into a next jug, which itself can overflow into a subsequent jug, and so on.

I'd like to calculate the probability that over $i$ iterations of the process, there is at least one iteration where all bins with balls are each exactly at their common capacity $c$.

I honestly haven't a clue how to even begin here, apologies I can't show what I've attempted.

Edit: A simple example case.

Let $min, max, c, i$ = $1, 2, 3, 4$

We have then the following possible tuples of outcomes for the $i=4$ iterations:

{{1, 1, 1, 1}, {1, 1, 1, 2}, {1, 1, 2, 1}, {1, 1, 2, 2}, {1, 2, 1, 1}, {1, 2, 1, 2}, {1, 2, 2, 1}, {1, 2, 2, 2}, {2, 1, 1, 1}, {2, 1, 1, 2}, {2, 1, 2, 1}, {2, 1, 2, 2}, {2, 2, 1, 1}, {2, 2, 1, 2}, {2, 2, 2, 1}, {2, 2, 2, 2}}

Accumulating each gives the sums for each case at each iteration (bold where the sum is exactly divisible by $c$):

{{1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 2, 4, 6}, {1, 3, 4, 5}, {1, 3, 4, 6}, {1, 3, 5, 6}, {1, 3, 5, 7}, {2, 3, 4, 5}, {2, 3, 4, 6}, {2, 3, 5, 6}, {2, 3, 5, 7}, {2, 4, 5, 6}, {2, 4, 5, 7}, {2, 4, 6, 7}, {2, 4, 6, 8}}

Simply counting the possible outcomes that contain at least one element meeting the criteria, we see that the probability of at least one of the sums over 4 iterations with minimum 1 ball produced and maximum 2 balls produced per iteration, with bin capacity 3, being divisible exactly by the capacity is $7/8$.

I perhaps worded the "...haven't a clue..." above poorly, and meant "...haven't a clue how to do this efficiently...".

In this trivial case, a simple Markov chain with transition matrix:

$\left( \begin{array}{cccc} 0 & \frac{1}{2} & \frac{1}{2} & 0 \\ 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 0 & 0 & 1 \\ \end{array} \right)$

suffices to get what I'm after for some arbitrary $i$ iterations.

However, for larger cases (I'd like to apply this for exact results for $min \approx 10^3, max \approx 10^4, c \approx 10^4, i \approx 10^3$) the number of states/transitions becomes huge.

I'm fine with "there's probably not a more efficient way vs your Markov chain" if that's the determination of the experts here, but I never cease to be surprised by efficient methods for calculations that seem intractable at first sight that appear in answers here.

Hence the question.

1

There are 1 best solutions below

0
On

Edited 12 hours later, after OP revised question, and after I found confusing typos in my original answer.

So consider the random walk $S_i = B_1+ B_2+\dots+B_i \bmod c$, where the $B_i$ are chosen uniformly and independently from $\{m,m+1,\ldots,M\}$, and the killed random walk $S'_i$ obtained by removing mass when $S_i \equiv 0 \pmod c$, for $i>0$. The probability you want, that $S_t$ hits $0 \bmod c$ before time $i$ is 1 minus the probability that $S'_i$ is alive at time $i$. It is easy to see that the probability that $S'_i$ is alive decays to 0 exponentially quickly, but I don't have an estimate of the rate in mind just now.

As you suspected, your combination of parameters is awkward. I don't know how numerically stable the following procedure is, but it might give usable numerical approximations. Let $\lambda=(\lambda_0,\ldots,\lambda_{c-1}) \in\mathbb C^c$ be the mod $c$ FFT of the uniform distribution on $\{m,m+1,\ldots,M\}$. Initialize a complex vector $z=(z_0,\ldots z_{c-1})\in\mathbb C^c$ with all $1$s: $z\leftarrow(1,1,\dots,1)$. Repeat the following process $i$ times: first, update $z$ by multiplying coordinatewise by $\lambda$, and then, subtract from each entry in $z$ the average of all the entries in $z$ . (Simultaneously, not serially, of course.) After $i$ steps of this, $z$ should be the FFT of the probability distribution of the killed process $S_i'$, and the probability it is alive should be $z_0$, and your desired probability should be $1-z_0$. (The total cost here is about $O(c\log c + ic)$ which compares favorably to a direct matrix multiplication methods involving $i$ matvec operations which would take $O(c^2)$ or $O(c\log c)$ apiece.)