Collisions in a Sample

62 Views Asked by At

Based on birthday paradox;

Let $d$ be the set of elements randomly chosen from a set of $n$ distinct elements then

a) What is expected number of unique elements in $d$ (remaining will be repetition of unique elements)?

b) What is expected maximum count/ frequency of occurrence of an element in $d$?

c) How large d will be such that all distinct elements of $n$ appear in $d$ atleast once?

for simplicity, Let

$n = [0,1,2,3...99]$

$d = randomly\ chosen\ 100\ elements\ from\ n$

1

There are 1 best solutions below

2
On BEST ANSWER

(a) If you draw $k$ times with replacement from $n$ possible values, the expected number of values drawn exactly once is $k\left(1-\dfrac1n\right)^{k-1}$, which is maximised when $n=k-1$ or $n=k$. In the case $n=k=100$ this is about $36.97$.

Similarly the expected number of values drawn zero times is $n\left(1-\dfrac1n\right)^{k}$ which with $n=k=100$ is about $36.60$. The expected number of values drawn two or more times is $n- (k+n-1)\left(1-\dfrac1n\right)^{k-1}$ which with $n=k=100$ is about $26.42$.

(b) I am not aware of a closed form expression but, for $n=k=100$, simulation suggests that the value drawn most often is drawn an average of about $4.23$ times.

(c) If you draw until each of $n$ values has been drawn at least once, this is the coupon collector's problem and the expected number of draws needed number is $n\,H_n=n\sum\limits_{m=1}^n \frac1m$ which with $n=100$ is about $518.74$.