Growth rate of the expected number of draws with replacement from an urn with $n$ balls needed before you draw the same ball twice

113 Views Asked by At

Consider an urn with $n$ balls. Draw a ball from the urn and put it back. Repeat this until you draw a ball that you have drawn before. Let $E(n)$ denote the expected number of draws that result from this process. I have convinced myself by writing a program that $E(n)\in\Theta(√n)$ however I would like a formal proof of this. Also I would like to know the constant coefficient $c$ such that $E(n) \sim c√n$. Please note that my question is related to this question here in that it considers the same underlying problem however my question is specifically interested in a proof of the growth rate as well as the coefficient.

2

There are 2 best solutions below

0
On BEST ANSWER

Even though the accepted answer is probably perfectly satisfying, let me just give some self-contained proof for future reference. I'm not sure how Flajolet and Sedgewick prove it, but I claim it is a pretty standard application of the Laplace method. First note that:

$$\sum_{k=0}^n \frac{n(n-1)\ldots(n-k+1)}{n^k}=\sum_{k=0}^n \binom{n}{k}\frac{1}{n^k}\int_0^\infty u^k e^{-u} du=\int_0^\infty (1+u/n)^n e^{-u}du$$

To get at the right scaling, let $t=u/\sqrt{n}$, so that:

$$\frac{E(n)}{\sqrt{n}}=\int_0^\infty \exp\left(n\log\left(1+\frac{t}{\sqrt{n}}\right)-t\sqrt{n}\right)dt$$

Now:

  • If $t$ is large (say $t\ge n^{1/100}$), the corresponding integral is exponentially small in $n$.
  • If $t$ is small (say $t\le n^{1/100}$), note that $n\log(1+t/\sqrt{n})-t\sqrt{n}=-t^2/2+O(t^3/\sqrt{n})$, so the corresponding integral is within a $O(n^{4/100}/\sqrt{n})$ additive factor of:

$$\int_0^{n^{1/100}} e^{-t^2/2} dt=\sqrt{\pi/2}+O(e^{-(n^{1/100})^2/2})$$

Hence $E(n)\sim \sqrt{n \pi/2}$

0
On

Perhaps the following should really be a comment, but it's getting a bit long for that.

The answer by bof in the linked-to question Expected number of draws until a repeat draw shows $E(n) = 1 + Q(n)$, where $$Q(n) = \sum_{1 \le k \le n} \frac{n!}{(n-k)! n^k}$$
$Q(n)$ is known in probability theory as the "Ramanujan Q function" or the "birthday function".

Theorem 4.8 in An Introduction to the Analysis of Algorithms, Second Edition by Robert Sedgewick and Philippe Flajolet shows $$Q(n) = \sqrt{\pi n /2} + O(1)$$ so $$E(n) = 1 + Q(n) = \sqrt{\pi n /2} + O(1)$$

EDIT

An alternative reference is The Art of Computer Programming, Second Edition, Volume 1: Fundamental Algorithms by Donald Knuth,section 1.2.11.3, which gives the more precise estimate $$Q(n) = \sqrt{\frac{\pi n}{2}} - \frac{1}{3} + \frac{1}{12} \sqrt{\frac{\pi}{2n}} - \frac{4}{135n} + \frac{1}{288}\sqrt{\frac{\pi}{2n^3}} + O(n^{-2}) $$