Consider coupons numbered $1, 2, ..., n,$ which you collect until you have all numbers. Each time you get a coupon, its number is independent of previous numbers and all numbers are equally likely. Let $N$ be the number of coupons you need, and show that
(a) $\mathbb E[N] = n\sum^n_{k=1}\frac{1}{k} \sim n\log n$ for large $n$,
(b) there is a constant c such that $\operatorname{Var}[N] \sim cn^2$ for large $n$. What is $c$?
This is a famous problem called the collector's problem. Here follows one solution for part a) of the question:
Let $N$ denote the (random) number of coupons that we need to purchase in order to complete our collection. We can write $N = X_1 + X_2 + . . . + X_n,$ where for any $k = 1, 2, . . . , n,$ $X_k$ denotes the additional number of coupons that we need to purchase to pass from $k − 1$ to $k$ different types of coupons in our collection. Trivially $X_1 = 1$ and, since we are considering the case of a uniform distribution, it follows that when $k$ distinct types of coupons have been collected, a new coupon purchased will be of a distinct type with probability equal to $\frac{n-k}{n}$. By the independence assumption, we get that the random variable $X_k$, for $k ∈ {2,...,N}$, is independent from the other variables and has a geometric law with parameter $\frac{n-k+1}{n}$. The expected number of coupons that we have to buy to complete the collection will be therefore
$$\mathbb{E}[N]=\mathbb{E}[X_1]+\mathbb{E}[X_2]+...+\mathbb{E}[X_n] \tag{1}$$
$$=1+\frac{n}{n-1}+\frac{n}{n-2}+...+\frac{n}{2}+n \tag{2}$$
$$=n\sum_{k=1}^n\frac{1}{k}. \tag{3}$$
I don't understand this solution. I have the following questions:
- Can someone explain the problem statement? What collection is supposed to be completed? What do they mean by compeleted?
- Why is it trivial that $X_1=1?$
- How can we conclude that there is a geometric distribution? First they say that they are considering the case of uniform distribution and then they switch to geometric.
- I don't see how the $X_k$'s can be independent. If I purchase a coupon, then I have less purchases to make, which changes my probability.
- How does one show that $(3)=n\log{n}$ as $n\rightarrow \infty?$
The problem is basically saying you are trying to complete a collection of some sort. Examples can include stamps, card trading games.
I'll describe this using the Pokemon example. This is modelled mathematically as follows: in the whole game, there are a fixed number of different cards. In the Pokemon version, there were 150 different Pokemon you could collect. Each time you purchase a card, you get a card with a picture of a Pokemon on it. This may be one you already have, or a new one.
The collection to be completed is the collection of Pokemon cards, all 150 of them. This is what they mean by the collection being completed - you are done when you have all 150 of them. In your problem, this number is $n$. (Also they use the word coupon instead of cards.)
Think about what $X_1$ means. It is "the additional number of coupons that we need to purchase to pass from $0$ to $1$ different types of coupons in our collection". Well, the moment you buy your first coupon, you have a new type of coupon, since you had none to start with. So it takes you $1$ purchase to go from $0$ types to $1$ type. So $X_1=1$.
From Wikipedia, the geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$.
Here we are defining "success" to mean "getting a new type of coupon". Each time you buy a coupon, it is independent from anything you have previously bought, and the probability that it is a new type of coupon is some constant $p$. $X_k$ tells us how many purchases (i.e. independent trials) it will take until the first occurrence of a success (getting a new type of coupon). This fits the above description, hence it is geometric.
I don't fully understand your counter example here. They are independent because the number of coupons you will need to buy to go from having $2$ types to $3$ types has nothing to do with how many you will need to buy to go from $3$ types to $4$ types.
The probability does change when going from trying to accomplish one task to trying to accomplish the other, but whatever happened when trying to accomplish the first task does not impact the second at all.
When trying to go from $2$ types to $3$, say you had to buy $m$ coupons. Then the first $m-1$ of these were just duplicates of the $2$ types you already had, and the $m$th one was a new type. How should this impact what will happen when trying to go from $3$ types to $4$ types?
For large $n$, we can bound the sum with some integrals, which can be pictured by drawing the plot of $y=1/x$ from $x=0$ to $x=n+1$, and drawing rectangles and finding two different ways to represent the sum. We get $$\int_1^{n+1}\frac1k\,dk<\sum_{k=1}^n\frac1k<1+\int_1^n \frac1k\,dk\\\implies n\log(n+1)<n\sum_{k=1}^n\frac1k<n+n\log n$$ For large $n$, since $n\log n$ is a larger term than $n$, we can ignore the $n$, and so the bounds either side look more and more like $n\log n$. So $$n\sum_{k=1}^n\frac1k\sim n\log n$$ Here is a plot to demonstrate these bounds.