How often was the most frequent coupon chosen?

885 Views Asked by At

In the coupon collector's problem, let $T_n$ denote the time of completion for a collection of $n$ coupons. At time $T_n$, each coupon $k$ has been collected $C_k^{n}\geqslant 1$ times. Consider how often the most frequently chosen coupon, was chosen, that is, the random variable $$C^*_n=\max_{1\leqslant k\leqslant n}C_k^{n}. $$

Can one compute $E(C^*_n)$? What is a simple asymptotics of $E(C^*_n)$ when $n$ grows large? Does $C_n^*/E(C^*_n)$ converge in distribution and, if it does, what is its limit?

Note that $C_1^n+C_2^n+\cdots+C_n^n=T_n$. Since $E(T_n)=nH_n$ where $H_n=\sum\limits_{k=1}^n\frac1k$ denotes the $n$th harmonic number, such that $H_n=\log n+O(1)$, one has $E(C_n^*)\geqslant\log n$ and $E(C_n^*)=O(n\log n)$.


In a somewhat more ambitious version of this question, consider the nondecreasing rearrangement $C_{(1)}^n\leqslant C_{(2)}^n\leqslant\cdots\leqslant C_{(n)}^n$ of $(C_1^n,C_2^n,\ldots,C_n^n)$. Thus, $C_{(1)}^n=1$ and $C_{(n)}^n=C_n^*$.

Can one compute (or, get some simple asymptotics of) each $E(C^n_{(k)})$? And what is the "profile" of the random vector $(C_{(1)}^n,C_{(2)}^n,\ldots,C_{(n)}^n)$ when $n$ grows large? To be specific:

Does the random vector $$\left(\frac{C_{(1)}^n}{C_{(n)}^n},\frac{C_{(2)}^n}{C_{(n)}^n},\ldots,\frac{C_{(n)}^n}{C_{(n)}^n}\right)$$ converge in distribution and, if it does, what is its limit?

Edit: Amy N. Myers and Herbert S. Wilf (Some New Aspects of the Coupon Collector's Problem, SIAM Review 48(3), 2006) provide explicit formulas for the distribution and the mean of the number $S_n$ of singletons. In the notations above, $S_n$ is the size of the set of $1\leqslant k\leqslant n$ such that $C^n_k=1$, and also the maximum of the set of $1\leqslant k\leqslant n$ such that $C^n_{(k)}=1$. Myers and Wilf show that, for every $i$, $$P(S_n=i)=i{n\choose i}\int_0^\infty x^{i-1}(e^x-1-x)^{n-i}e^{-nx}dx,$$ and they deduce from this the esthetically pleasing identity $$E(S_n)=H_n.$$

2

There are 2 best solutions below

2
On

Here is a simple simulation in Python 3:

import random

def completeCollection(n):
    #collects random "coupons"
    #in range 0,1,...,n-1
    #until each is encountered at least once
    #returns the list of counts

    counts = [0]*n
    collected = set()
    while len(collected) < n:
        coupon = random.randint(0,n-1)
        counts[coupon] += 1
        collected.add(coupon)
    return counts

def expectedMax(n,trials):
    #estimates the number of times
    #the most collected coupon is
    #collected while collecting n coupons
    count = 0
    for i in range(trials):
        count += max(completeCollection(n))
    return count/trials

I evaluated [expectedMax(n,1000) for n in range(1,101)] and obtained:

[1.0, 1.985, 2.79, 3.403, 4.052, 4.575, 4.784, 5.145, 5.45, 5.672, 6.008, 6.198, 6.515, 6.508, 6.765, 7.006, 7.074, 7.207, 7.466, 7.416, 7.534, 7.711, 7.812, 7.992, 8.049, 8.012, 8.268, 8.467, 8.408, 8.467, 8.604, 8.54, 8.779, 8.666, 8.804, 9.121, 9.076, 9.033, 9.179, 9.289, 9.344, 9.33, 9.479, 9.456, 9.601, 9.613, 9.644, 9.836, 9.693, 9.82, 9.886, 9.944, 10.044, 10.124, 10.161, 10.113, 10.039, 10.273, 10.334, 10.345, 10.317, 10.454, 10.519, 10.483, 10.491, 10.496, 10.617, 10.593, 10.719, 10.859, 10.885, 10.782, 10.858, 10.841, 10.87, 10.804, 11.005, 11.005, 10.993, 11.105, 11.092, 11.121, 11.106, 11.159, 11.198, 11.209, 11.291, 11.393, 11.444, 11.428, 11.395, 11.584, 11.533, 11.511, 11.545, 11.559, 11.601, 11.585, 11.61,11.617]

Plotted this looks like:

enter image description here

which looks somewhat logarithmic. It also looks somewhat like a square root, but I evaluated expectedMax(1000,1000) and got 17.835, which suggests that the growth rate is slower than a square root.

3
On

Each coupon appears $\log n$ times on average.

If I treat the number for a single coupon as a binomial, the variance is also asymptotically $\log n$.

Now treat each of the counts as a normal variable $\mathcal N(\log n,\log n) $.

Expectation of the maximum of gaussian random variables shows the maximum of $n$ independent standard normal random variables to be around $\sqrt{2\log n}$, so the final answer would be $$\mu+\sigma \sqrt{2\log n}=\log n+\sqrt{\log n}\sqrt{2\log n}=(1+\sqrt2)\log n$$

Edit:

On the other hand, here is an argument that the limit is $e\log n$.

Treat each coupon count as an independent Poisson variable with parameter $\lambda=\log n$. The tail of the distribution is given here as $$e^{-\lambda}\frac{(e\lambda))^x}{x^x}=\frac1n\left(\frac{e\log n}x\right)^x$$ The chance that at least one coupon is inside this tail would be around $n$ times that. So the probability moves from near-zero to near-1 when $x$ is near $e\log n$.