counting runs of empty bins

300 Views Asked by At

Here is a variation of balls and bins problem. Throwing $m$ balls uniformly and independently into $n$ bins labeled $0,1,2,\ldots,n-1$. Counting empty bins of run length $k$: There is a $k$-gap starting at bin $i$ if bins $i,i+1,\ldots, i+k-1$ are empty.

I defined an indicator variable $Z_i=1$ iff a $k$-gap starts at position $i$ and $0$ otherwise. Then, $\sum_i Z_i$ will give me total number of $k$-gaps. I am trying to solve the following problem:

1) If we have $Z_i$ and $Z_j$ for $0\leq i<j \leq n-k$ and letting $0\leq c \leq k-1$ be the overlap of intervals $i,i+1,i+2,\ldots,i+(k-1)$ and $j,j+1,j+2,\dots,j+(k-1)$. What is the conditional probability: $Pr(Z_j=0 \vert Z_i=0)$?

Here is what I attempted. Using definition of conditional probability $$ Pr(Z_j=0 \vert Z_i=0) = \frac{Pr(Z_j=0 \cap Z_i=0)}{Pr(Z_i=0)} $$

For the denominator, $Pr(Z_i=0)=1-Pr(Z_i=1)= 1-((1-(\frac{k}{n}))^m$ (ball lands in one of the other $n-k$ bins and there are total $m$ bins so prob is $(1-\frac{k}{n})^m$ which is $Pr(Z_i=1)$, thus $Pr(Z_i=0) = 1- Pr(Z_i=1)$.

I am having hard time getting an expression for numerator. If I understand correctly, $$ \begin{align} Pr(Z_i=0 \cap Z_j=0) &= 1- Pr(Z_i=1 \cup Z_j=1)\\ &= 1 - \left[Pr(Z_i=1) + Pr(Z_j=1)- Pr(Z_i=1 \cap Z_j=1)\right] \end{align} $$ The first terms of the expression are $(1-\frac{k}{n})^m + (1-\frac{k}{n})^m$. Not sure how to get $Pr(Z_i=1 \cap Z_j=1)$. Does that mean the ball does not land in $i, i+1,\ldots,j+(k-1))$?

2) Assume $k$ divides $n-k+1$. For $0< t< k-1$, let $J_t = {j \in \left[0,1,\ldots,n-k\right] / j = t\mod k}$. What is $E[\sum_{j \in J_t} Z_j]$. $E[\cdot]$ stands for the expected value of $(\cdot)$. Since $Z_j$ is an indicator, the sum of indicators is binomial and expectation of binomial is number of trials times the probability. Basically, we need to know number of elements in $J_t$ to get the probability. Is that correct?

1

There are 1 best solutions below

0
On

$1$) I agree with your work here.

Given that $Z_i$ and $Z_j$ overlap by $c$, which is to say $c = max\left(0, i + k - j\right)$ then: \begin{eqnarray*} Pr\left(Z_i = 1 \cap Z_j = 1\right) = \left(1-\dfrac{2k - c}{n}\right)^m \end{eqnarray*}

This is by reasoning similar to your own: All bins in $Z_i$ and $Z_j$ must remain empty and there are $2k-c$ of them.

The reason I used "max" is to cover the case where there's a gap (i.e. some bins) between $Z_i$ and $Z_j$. We don't want $c \lt 0$.

$2$) I'm confused a little by the notation $J_t = j \in \left[0,1,\ldots ,n-k\right]/j=t \bmod k$. It take it to mean $J_t = j \in \left[0,1,\ldots ,n-k\right] \mbox{ such that } j \equiv t \bmod k$. So that $J_t$ is a set of integers from $0$ to $n-k$ distance $k$ apart from each other.

For example, if $n=8$ and $k=3$ then \begin{eqnarray*} J_0 &=& \left\{0,3\right\} \\ J_1 &=& \left\{1,4\right\} \\ J_2 &=& \left\{2,5\right\} \\ \end{eqnarray*}

Note that regardless of $t$, $\vert J_t \vert = \dfrac{n-k+1}{k}$. So, \begin{eqnarray*} E\left[\sum_{j \in J_t}{Z_j}\right] &=& \sum_{j \in J_t}{E\left[Z_j\right]} \\ &=& \sum_{j \in J_t}{Pr\left(Z_j = 1\right)} \\ &=& \vert J_t \vert \times \left(1-\dfrac{k}{n}\right)^m \qquad\mbox{since the $Z_j$s are non-overlapping} \\ &=& \dfrac{n-k+1}{k} \times \left(1-\dfrac{k}{n}\right)^m. \\ \end{eqnarray*}

I don't think you have to worry about the probability distribution of the sum of the $Z_j$s because using the linearity of expectation (expectation of a sum is the sum of the expectations) makes it easier.