Birthday Problem: Asymptotics of Expected Time Until a Match Occurs

455 Views Asked by At

I'm working on a variant of the birthday problem that I haven't found discussed on this site.

Suppose the sequence $(X_n)$ of independent random variables takes values uniformly in $\{ 1,...,N \}$. Let $F_{N} = \min\{ m: X_m = X_k, k<m \}$ be the first time that a match is observed.

I want to know what can be said about $E(F_N)$ as $N \to \infty$.

It's easy to see that $$P(F_N = k) = \frac{N}{N} \frac{N-1}{N}... \frac{N - (k-2)}{N} \frac{k-1}{N}.$$

Hence, $$E(F_N) = \sum_{k=2}^{N+1} k \Big[\frac{N}{N} \frac{N-1}{N}... \frac{N - (k-2)}{N} \frac{k-1}{N} \Big]. $$

Any suggestions about where to go from here?

2

There are 2 best solutions below

0
On BEST ANSWER

The probability of the first match on the $k^\text{th}$ trial is $$ \begin{align} &\overbrace{\frac nn\frac{n-1}n\cdots\frac{n-k+2}n}^{\text{no match in $k-1$ trials}}-\overbrace{\frac nn\frac{n-1}n\cdots\frac{n-k+1}n}^{\text{no match in $k$ trials}}\\ &=\frac{n!}{n^{k-1}(n-k+1)!}-\frac{n!}{n^k(n-k)!}\\ &=\frac{n!}{n^{k-1}(n-k+1)!}-\frac{n!}{n^{k-1}(n-k+1)!}\frac{n-k+1}n\\ &=\frac{n!\,(k-1)}{n^k(n-k+1)!}\tag{1} \end{align} $$ Therefore, the expected value is $$ \begin{align} E(F_n)= &\sum_{k=0}^n\frac{n!\,k(k-1)}{n^k(n-k+1)!}\\ &=\frac{n!}{n^{n+1}}\sum_{k=0}^n\frac{k(k-1)}{(n-k+1)!}n^{n-k+1}\\ &=\frac{n!}{n^{n+1}}\sum_{k=1}^{n-1}\frac{(n-k+1)(n-k)}{k!}n^k\\ &=\frac{n!}{n^{n+1}}\sum_{k=1}^{n-1}\frac{n(n+1)-2kn+k(k-1)}{k!}n^k\\ &=\frac{(n+1)!}{n^n}\sum_{k=1}^{n-1}\frac{n^k}{k!}-\frac{2n!}{n^{n-1}}\sum_{k=0}^{n-2}\frac{n^k}{k!}+\frac{n!}{n^{n-1}}\sum_{k=0}^{n-3}\frac{n^k}{k!}\\ &=-\frac{(n+1)!}{n^n}+\frac{n!}{n^n}\sum_{k=0}^n\frac{n^k}{k!}\tag{2} \end{align} $$ Applying equation $(11)$ from this answer and Stirling's Approximation gives the expected value as $$ \bbox[5px,border:2px solid #C0A000]{E(F_n)=\frac12\sqrt{2\pi n}+\frac23+O\left(\frac1{\sqrt{n}}\right)}\tag{3} $$


Extended Asymptotics

Extending the computation we did for $(3)$, we get $$ E(F_n) =\sqrt{2\pi n}\left(\frac12+\frac1{24n}+\frac1{576n^2}\right) +\left(\frac23-\frac4{135n}+\frac8{2835n^2}\right) +O\left(\frac1{n^{5/2}}\right)\tag{4} $$

4
On

This is an elaboration of what @Did commented on $n$-th match.

Fix $K>0$. Denote by $F_N^{(n)}$ the time of $n$-th match and set $F_N^{(0)}=0$. For any fixed $n\geq 1$, we will find the joint CDF: For positive integers $k_1, \ldots , k_n $ with $0<k_1<k_2<\cdots <k_n\leq N$ and $k_n \leq K\sqrt N $, $$ \begin{align} P &(F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) \\ &=\frac{k_1-1}N\prod_{i=1}^{k_1-2}\left(1-\frac{i}N\right) \frac{k_2-2}N\prod_{i=k_1-1}^{k_2-3}\left(1-\frac iN\right)\cdots \frac{k_n-n}N\prod_{i=k_{n-1}-(n-1) }^{k_n-(n+1)}\left(1-\frac iN\right)\\ &=\frac{k_1-1}N \cdots \frac{k_n-n}N \prod_{i=1}^{k_n-(n+1)}\left(1-\frac iN\right)\\ &=\frac{k_1-1}N \cdots \frac{k_n-n}N \exp\left( -\frac{(k_n-(n+1))^2}{2N}+O(N^{-2})\right). \end{align} $$ This gives $$ P (F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) = \frac{k_1-1}{\sqrt N } \cdots \frac{k_n- n}{\sqrt N}\exp\left( -\frac12 \left(\frac{ k_n-(n+1) }{\sqrt N}\right)^2+O(N^{-2})\right) \frac 1{\sqrt N^n} . $$ Fix $0\leq x_1, \ldots , x_n\leq K$, and sum this up for $k_i\leq x_i \sqrt N$, we have as $N\rightarrow\infty$, $$ P\left( \frac{F_N^{(1)}}{\sqrt N}\leq x_1, \ldots , \frac{F_N^{(n)}}{\sqrt N} \leq x_n\right) \rightarrow \int_{0\leq t_1\leq \cdots \leq t_n, \ \forall i, t_i\leq x_i} t_1 \cdots t_n \exp \left(-\frac12 t_n^2\right) dV $$ (Think of this as summing the probabilities over the boxes with side length $1/\sqrt N$. The Dominated Convergence Theorem will suffice to justify this limit.)

Thus, the random vector $\left(\frac{F_N^{(1)}}{\sqrt N}, \ldots , \frac{F_N^{(n)}}{\sqrt N}\right)$ converges in distribution to the continuous random variable with PDF $$ f(t_1,\ldots , t_n) = t_1 \cdots t_n \exp\left(-\frac 12 t_n^2\right) \mathbf{1}_{0\leq t_1 \leq \cdots \leq t_n}. $$

The question was originally about the expectation in case with $n=1$. So, the above calculation suggests that the expectation can be similarly calculated as $N\rightarrow\infty$, $$ \mathbf{E}\left(\frac{F_N^{(1)}}{\sqrt N} \right)\rightarrow \int_0^{\infty} t_1^2 \exp\left(-\frac12 t_1^2\right) dt_1 = \sqrt{\frac{\pi}2}. $$

But, we need to treat the case with $k_1\neq O(\sqrt N)$. In the general case $k_n\neq O(\sqrt N)$. To do this, we use $$ \log(1-x) \leq -x. $$ Then for $K\sqrt N < k_n$, $$ \frac{k_n}{\sqrt N}P (F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) \leq \frac{k_1\cdots k_{n-1}k_n^2}{\sqrt N^{n+1}} \exp\left( -\frac12 \left(\frac{ k_n-n-1 }{\sqrt N}\right)^2\right)\frac1{\sqrt N^n}. $$ Again by the Dominated Convergence Theorem, the right side after summing up for $0\leq k_1\leq \cdots \leq k_n$, becomes as $N\rightarrow\infty$, $$ \int_K^{\infty} \int_0^{t_n} \cdots \int_0^{t_2} t_1\cdots t_{n-1}t_n^2 \exp\left(-\frac12 t_n^2 \right) dt_1\cdots dt_n. $$ Note that this can be made arbitrarily small with sufficiently large $K$. This shows that the suggested calculation is valid. We now have $$ \mathbf{E}\left(\frac{F_N^{(n)}}{\sqrt N}\right) \rightarrow \int_0^{\infty} \int_0^{t_n} \cdots \int_0^{t_2} t_1 \cdots t_{n-1}t_n^2 \exp\left(-\frac 12 t_n^2 \right) dt_1\cdots dt_n. $$ This integral is in fact as @Did computed in the last comment to the question, $$ \frac1{2^{n-1}(n-1)!} \int_0^{\infty} t_n^{2n} \exp\left(-\frac 12 t_n^2 \right) dt_n=\frac{(2n-1)!!}{2^{n-1}(n-1)!} \sqrt{\frac{\pi}2}\sim \sqrt{2n}. $$