Expected number of duplicates

290 Views Asked by At

Suppose I have $m$ bins and throw $n\ll m$ balls into the bins uniformly at random. (In my application $n\sim m/\log m.$) What is the expected number of duplicates? In other words, if there are $k_i$ balls in bin $i$, I'm looking for the distribution of the random variable $$ D=\sum_{i=1}^m\max(k_i-1,0). $$

1

There are 1 best solutions below

0
On BEST ANSWER

As has been pointed out in the comments, linearity of expectation does not require independence. Thus, the expected number of duplicates, which is $n-m$ plus the expected number of empty bins, is $n-m$ plus $m$ times the probability for a given bin to be empty:

$$ E[D]=n-m+m\left(1-\frac1m\right)^n=n+m\left(\mathrm e^{-n/m}-1\right)+O\left(\frac nm\right)=\frac{n^2}{2m}+O\left(\frac nm\right)+O\left(\frac{n^3}{m^2}\right)\;. $$

The last result can also be contained by counting just the expected number of doubles,

$$ m\binom n2\frac1{m^2}\left(1-\frac1m\right)^{n-2}=\frac{n(n-1)}{2m}\left(1+O\left(\frac nm\right)\right)=\frac{n^2}{2m}+O\left(\frac nm\right)+O\left(\frac{n^3}{m^2}\right)\;, $$

and noting that the contribution from higher tuples is $O\left(\frac{n^3}{m^2}\right)$.