balls and bins when maxload $\geq 3$

97 Views Asked by At

There are $m$ balls and $n$ bins.

Let the number of balls that the i-th bin contains equals $X_i$, and $Y=\text{max} X_i$.

I know that when $m=\Theta(\sqrt{n})$, with high probability $Y\geq 2$. The proof is as following.

$$Pr(Y\geq 2)=1-Pr(Y=1)=1-(1-\frac{1}{n})(1-\frac{2}{n})\cdots (1-\frac{m-1}{n})$$ apply $1-x\leq e^{-x}$, we can get $$Pr(Y\geq 2)\geq 1-e^{-\dfrac{1}{n}}e^{-\dfrac{2}{n}}\cdots e^{-\dfrac{m-1}{n}}=1-e^{-\frac{m(m-1)}{2n}}$$ let $m=4\sqrt{n}+1$, we can get $$Pr(Y\geq 2)\geq 1-e^{-8}$$ similarly, we can apply $1-x\geq e^{-2x}$ to get the lower bound.

My question: how to prove when $m=\Theta(n^{\frac{2}{3}})$, with high probability $Y\geq 3$ (this is from a lecture without proof).

My try:

$Pr(Y\geq 3)=1-Pr(Y=1)-Pr(Y=2)$, but I don't know how to calculate $Pr(Y=2)$.

2

There are 2 best solutions below

1
On BEST ANSWER

Number the balls from $1$ to $m$. For any $1\le i<j<k\le m$, define the event $$ E_{i,j,k}=\{\text{event that balls $i,j$ and $k$ are in same box}\}. $$ Then, let $N$ be the number of these events that occur, that is the number of triples of balls which end up in the same box: $$ N=\sum_{1\le i<j<k\le n}{\bf 1}(E_{i,j,k}). $$ Note that $Y\ge 3$ if and only if $N>0$, so we just need a lower bound for $P(N>0)$. This is provided by the second moment inequality: $$ P(N>0)\ge \frac{(E[N])^2}{E[N^2]}=\frac{(E[N])^2}{(E[N])^2+\operatorname{Var}N}.\tag1 $$ By linearity of expectation, we know $$E[N]=\binom m3\frac 1{n^2}.\tag2$$ All that remains is to handle the variance. We can write $$ \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \Var N=\sum_{1\le i<j<k\le n} \Var[{\bf 1}(E_{i,j,k})]+\sum_{}\Cov({\bf 1}(E_{i,j,k}),{\bf 1}(E_{\ell ,p,q})) $$ The second summation ranges over all ways to choose distinct triples $\{i,j,k\}\neq\{\ell,p,q\}$ so that $i<j<k$ and $\ell<p<q$. The first summation is easy, since $$ \sum_{1\le i<j<k\le n} \Var[{\bf 1}(E_{i,j,k})]=\binom{m}3 \frac 1{n^2}\left(1-\frac1{n^2}\right) $$ The covariance summation is where most of the work is. Each covariance summand is equal to $$ \Cov({\bf 1}(E_{i,j,k}),{\bf 1}(E_{\ell ,p,q}))=P(E_{i,j,k}\cap E_{\ell,p,q})-P(E_{i,j,k})P(E_{\ell,p,q}) $$ This quantity depends on the number of indices that $\{i,j,k\}$ has in common with $\{\ell,p,q\}$. Fortunately, if the indices are disjoint, then the events are independent, so the covariance is zero. The same is true if the indices have exactly one number in common (prove this). All that remains is when they have two indices in common, say $p=i$ and $q=j$. In this case, we have $$ P(E_{i,j,k}\cap E_{i,j,\ell})-P(E_{i,j,k})P(E_{i,j,\ell})=\frac1{n^3}-\left(\frac1{n^2}\right)^2=\Theta(n^{-3}) $$ Since the number of ways to choose $\{i,j,k\}$ and $\{\ell,p,q\}$ so that they intersect in two places is $\Theta(m^4)$ (there are $\binom{m}2$ ways to choose the overlapping pair, then $(m-2)(m-3)$ ways to choose the other two numbers), the entire covariance summation is something like $$ \sum_{}\Cov({\bf 1}(E_{i,j,k}),{\bf 1}(E_{\ell ,p,q}))=\Theta(m^4n^{-3}) $$ This is negligible in the regime where $m=\Theta(n^{2/3})$. Therefore, we get $$ \Var N=\binom{m}3n^{-2}(1-n^{-2})+o(1)=\binom{m}3n^{-2}+o(1)\tag3 $$ where $o(1)$ is a term going to zero as $m\to\infty$.

Conclude by combining $(1),(2)$ and $(3)$.

0
On

Disclaimer: I couldn't solve it, but maybe this method can be salvaged or gives some ideas.

I assume the balls and the boxes are both distinguashable. Let's calculate

$$\mathbb{P}(Y\leq 2) = \frac{\text{number of ways to put so that } Y\leq2}{n^m}.$$

The numerator equals $m!$ times the coefficient of $x^m$ in $(1+x+\frac{1}{2}x^2)^n$. Using the multinomial theorem we can write

$$(1+x+\frac{1}{2}x^2)^n = \sum_{i+j+k=n} {n \choose i \space j \space k}\frac{1}{2^k}x^{j+2k} $$

To get the coefficient of $x^m$ (let's call it $c_m$), solve the equations $i+j+k=n$ and $j+2k=m$ in terms of $k$ to get

$$c_m = \sum_{k=0}^{\lfloor \frac{m}{2} \rfloor} {n \choose n-m+k, \space m-2k, \space k}\frac{1}{2^k} $$

So

$$ \mathbb{P}(Y\leq 2) = \frac{m!c_m}{n^m} = \sum_{k=0}^{\lfloor \frac{m}{2} \rfloor} \frac{m!n!} {n^m(n-m+k)! (m-2k)! k! 2^k} $$

Using the Stirling bounds $e(\frac{x}{e})^x \leq x! \leq ex(\frac{x}{e})^x$ we have

$$\mathbb{P}(Y\leq 2) \leq e^{-1-m}m^{m+1}n^{n+1-m} \sum_{k=0}^{\lfloor \frac{m}{2} \rfloor} \frac{1}{(n-m+k)^{n-m+k}(m-2k)^{m-2k}(2k)^k} $$

Now, assuming (this isn't true!!, but I feel like it's kind of almost true) that the term in the sum is at its biggest for $k=0$, we would have (there are at most $\frac{m}{2}+1 \leq m$ terms)

$$\mathbb{P}(Y\leq 2) \leq e^{-m - 1} m^2 n^{-m + n + 1} (n - m)^{m - n} $$

and plugging in $m=n^{2\over 3}$ according to Wolfram alpha that tends to zero.

EDIT: We can actually solve which term is the largest in the sum by considering the function (take logarithm to make it more managable)

$$g(k) = -(n-m+k)\log(n-m+k) - (m-2k)\log(m-2k)-k\log(2k).$$

Deriving and finding root we get a second degree equation and the solution (pick minus sign to have $k$ in the region)

$$k_{\max} = \frac{n+m-\sqrt{(n+m)^2-2m^2}}{2}.$$

This is close to zero because $n$ is much bigger than $m$. This still isn't a rigorous proof but maybe it will show the way...

Not sure if this will help, but using the inequality $1-\frac{x}{2}-\frac{x^2}{2} \leq \sqrt{1-x}$ we get

$$2k_{\max} = (n+m)\left(1- \sqrt{1-\frac{2m^2}{(n+m)^2}} \right) \leq \frac{m^2}{n+m} + \frac{2m^4}{(n+m)^3}. $$