Coupon collectors problem: solution through alternate route leads to expression hard to connect to result.

199 Views Asked by At

The identity I want help proving is the following (given $m$ probabilities, $p_j$ such that $\sum_j p_j = 1$): $$ \int\limits_0^\infty t \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j dt = \sum\limits_j\frac 1 p_j - \sum\limits_{i<j}\frac {1}{p_i+p_j} + \dots +(-1)^{m-1} \frac{1}{p_1+\dots+p_m}$$

For background and motivation, see below.


In example 5.17 of the book, Introduction to probability models by Sheldon Ross, the Coupon collector's problem is tackled for the general case where the probability of drawing coupon $j$ is given by $p_j$ and of course, $\sum\limits_j p_j = 1$. Now, he defines $X_j$ as the first time a coupon of type $j$ is observed, if the $j$th coupon arrives in accordance to a Poisson process with rate $p_j$. We're interested in the time it takes to collect all coupons, $X$. So we get:

$$X = \max_{1\leq j \leq m}X_j$$

Further, since the $X_j$ are independent (discussion on that here), we get:

$$F_X(t) = P(X<t) = P(X_j<t \; \forall \; j) = \prod\limits_{j=1}^{m}(1-e^{-p_j t})\tag{1}$$

Now, Ross uses the expression: $E(X) = \int\limits_0^\infty S_X(t)dt$, where $S_X(t)$ is the survival function to get:

$$E(X) = \int\limits_{0}^{\infty}\left(1-\prod\limits_{j=1}^{m}(1-e^{-p_j t})\right) dt = \sum\limits_j\frac 1 p_j - \sum\limits_{i<j}\frac {1}{p_i+p_j} + \dots +(-1)^{m-1} \frac{1}{p_1+\dots+p_m}\tag{2}$$

Now, I want to get this same result using the old-fashioned definition of the expected value. For this, I differentiate equation (1) to get the PDF of $X$. First, let's take logarithm on both sides.

$$\log(F_X(t)) = \sum\limits_j \log(1-e^{-p_j t})$$

Now differentiate with respect to $t$.

$$\frac{f_X(t)}{F_X(t)} = \sum\limits_j \frac{p_j e^{-p_j t}}{1-e^{-p_j t}}$$

Finally yielding:

$$f_X(t) = \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j$$

Using this, we get an alternate expression for the expectation:

$$E(X) = \int\limits_0^\infty t f_X(t) dt = \int\limits_0^\infty t \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j dt$$

This should lead to the same expression as in equation (2). However, I don't know where to start. Why do I want to do it through this alternate route? Because I hope to find an expression for the variance as well and for that, need $E(X^2)$. Thought I'd tackle the easier, $E(X)$ for which we know there is a nice expression first.

2

There are 2 best solutions below

0
On BEST ANSWER

For brevity let $F = F_X$. For $L>0$ let $$I_L = \int_{0}^{L}tf_X(t)dt.$$ Using integration by parts, it follows that \begin{align*} I_L &= \int_{0}^{L}t F'(t) dt \\ &= tF(t)|_{0}^{L} - \int_{0}^{L} F(t) dt \\ &= L(F(L)-1) + J_L \end{align*} where $$J_L = \sum_{i=1}^{m} (-1)^{i-1} \sum_{0<j_1<...<j_i<m+1} \frac{1 - e^{-(p_{j_1}+...+p_{j_i})L}}{p_{j_1}+...+p_{j_i}}.$$ Show that $$\lim_{L\to\infty} L(F(L)-1) = 0.$$ Then it follows that $$\lim_{L\to\infty} I_L = \lim_{L\to\infty} J_L = \sum_{i=1}^{m} (-1)^{i-1} \sum_{0<j_1<...<j_i<m+1} \frac{1}{p_{j_1}+...+p_{j_i}}.$$

For the $E(X^2)$ you might consider doing what I did here but applying integration by parts twice.

0
On

I have an attempt at calculating the variance using the technique @BGM pointed out. The attempt is so far un-successful, but wanted to post it for my own reference and an answer seemed the best place given how long the question already is. As pointed out by @BGM,

$$E(X^2) = \int\limits_0^\infty 2u\left(1-F(u)\right) = \int\limits_0^\infty 2u\left(1-\prod\limits_{j=1}^m (1-e^{-p_j u})\right)du$$

$$ = \int\limits_0^\infty 2u\left(\sum e^{-{p_ju}} - \sum_{i<j} e^{-{(p_j+p_i)u}}+\dots+(-1)^{m+1}e^{-{(p_1+p_2+\dots+p_m)u}}\right)du$$

Now we know,

$$I = \int\limits_0^\infty u e^{-pu}du = \frac{1}{p^2}$$

$$=> E(X^2) = 2\left(\sum \frac{1}{p_j^2} + \sum_{i<j} \frac{1}{(p_i+p_j)^2} +\dots\right)$$

Trying to validate it for the case $p_j = \frac 1 m$ leads to some trouble. See here.