Is coupon collection concentrated?

263 Views Asked by At

Say you buy $m$ coupons out of $n$ possible types, and each is given you randomly with replacement. It's standard that the amount of time you need to wait to get $k$ coupons is approximately in expectation $$ n[\log (n) - \log (n - k)] $$ and inverting this you'd expect that the number of types of coupons you get after buying $m$ of them is about $$ n(1 - e^{-m/n}) $$ My question is, do you get concentration about this quantity? That is, I'm looking for a statement like: with high probability, if you buy $m$ coupons you are unlikely to be very far from having exactly $n(1 - e^{-m/n})$ distinct types of coupons. I'd be ok with a factor difference, but ideally I want to be only lower order terms away from this expectation.

1

There are 1 best solutions below

1
On

The expected distinct number collected is $$n\left(1- \left(1-\frac1n\right)^m\right)$$ which, as you say, can be close to $n\left(1- e^{-m/n}\right)$

The variance of the distinct number collected is $$n\left(1-\frac1n\right)^m + n^2\left(1-\frac1n\right)\left(1-\frac2n\right)^m - n^2\left(1-\frac1n\right)^{2m}$$ which empirically for large $m=n$ seems to be about $0.0972n$, while for large $m \approx n \log(n)$ seems to be about $1$, and for large $m\approx n H_n$ seems to be about $0.56$. For large $m \approx \sqrt{n}$, the variance seems to be about $0.5$

You could for example use the standard deviation (the square root of the variance) as a measure of spread