Proof that the limit of a series of exponential distributed random variables follows a Gumbel distribution

695 Views Asked by At

How can I prove that given $X_k \sim \text{Exp}(1/k),$

$$\lim_{n\to\infty}\left(-\log(n) + \sum_{k=1}^n X_k \right)\sim \text{Gumbel}(\mu =0,\beta = 1)$$

?

I see that $\log(n)$ is the integral of $1/x$ and that the expectation of the Gumbel distribution is the Euler Mascheroni constant, and the mean of the exponential distributions in the series is a harmonic series, but these intuitions are a far cry from proving the above is true.

2

There are 2 best solutions below

1
On BEST ANSWER

Consider the sequence of independent random variables $$Y_k = X_k - \frac{1}{k},$$ which have mean $0$. Then let $$S_n = -\log n + \sum_{k=1}^n X_k, = -\log n + H_n + \sum_{k=1}^n Y_n$$ and observe $$\begin{align} M_{S_n}(t) &= \operatorname{E}[e^{t S_n}] \\ &= e^{t(H_n - \log n)} \operatorname{E}\left[\prod_{k=1}^n e^{t Y_k} \right] \\ &\overset{\text{ind}}{=} e^{t(H_n - \log n)} \prod_{k=1}^n M_{Y_k}(t) \\ &= e^{t(H_n - \log n)} \prod_{k=1}^n \frac{e^{-t/k}}{1 - t/k}. \end{align}$$ Clearly we can see that the MGF of $S_\infty = \lim S_n$ is defined in a neighborhood of $0$ since the mean of each $Y_k$ is zero and $H_n - \log n \to \gamma$ as $n \to \infty$. Thus $$M_{S_\infty}(t) = \lim_{n \to \infty} e^{t(H_n - \log n)} \prod_{k=1}^n \frac{e^{-t/k}}{1-t/k} = e^{\gamma t} \prod_{k=1}^\infty \frac{e^{-t/k}}{1 - t/k}.$$

Now recalling the Weierstrass definition of the gamma function $$\Gamma(z) = \frac{e^{-\gamma z}}{z} \prod_{k=1}^\infty \frac{e^{z/k}}{1+z/k},$$ we see $$M_{S_\infty}(t) = (-t) \Gamma(-t) = \Gamma(1-t),$$ which is the MGF of a Gumbel distribution with location $0$ and scale $1$: $$f_G(g) = e^{-(g + e^{-g})}$$ has MGF $$M_G(t) = \operatorname{E}[e^{tG}] = \int_{g = -\infty}^\infty e^{tg} e^{-(g+e^{-g})} \, dg = \int_{u = 0}^\infty u^{-t} e^{-u} \, du = \Gamma(1-t). \tag{$u = e^{-g}$}$$

1
On

The Gumbel distribution as a limit of a sum of variables that follow an exponential distribution occurs in an article by Lars Holst$^{1}$.

The connection can be made as following:

Consider $n$ bins that are being filled with marbles according to an independent Poisson process for each bin (with an average of 1 ball per bin per time unit). Each bin is filled independently.

The image below gives an example for 9 bins. It illustrates the bins as a 2D plane (filled with points according to a Poisson process), in which the bins are horizontal strips in this 2D plane (this illustration is a simplified version of the one used by Holst, who generalized the illustration by having the strips being variable in size). The horizontal direction could be viewed as the 'time'. We marked the balls red when they are the first to fill the bin.

image

The markings $s_i$ and $t_i$ in the image relate to variables explained below. They are used to express the waiting time to fill all the bins with at least 1 ball.

We can approach the problem of finding the distribution of this waiting time to fill all the bins in two ways.

  1. The maximum of a sample of variables that follow an exponential distribution

    The waiting time for a particular bin to be filled follows an exponential distribution $s_i \sim Exp(1)$. The waiting time for all bins to be filled is the maximum of these $n$ exponentially distributed variables.

  2. Sum of variables that follow an exponential distribution We can express the waiting time for all bins to be filled by the sum of waiting times to get 1 step further.

    • The waiting time for a first bin to be filled follows an exponential distribution with a rate parameter equal to the number of empty bins $t_1 \sim Exp(n)$.

    • The waiting time for a second bin to be filled, after one bin is already filled an exponential distribution with a rate parameter equal to the number of empty bins which is now $n-1$. So $t_2 \sim Exp(n-1)$.

    • We can continue this for all the $n$ steps and we get that the waiting time to get from $k-1$ bins to $k$ bins follows an exponential distribution $t_k \sim Exp(n+1-k)$

    The total time is the sum of these steps (Note that these steps follow an exponential distribution independent from each other. This is because of the memoryless property of the Poisson process. The waiting time for the previous steps does not influence the waiting time for the later step.)

So the following two variables, describe the same process, and thus follow the same distribution

$$max(s_1, s_2, \dots , s_n) \qquad \sim \qquad \sum_{k=1}^n t_k$$

We know that the left-hand side, when we subtract $\log(n)$, approaches a Gumbel distribution$^2$ for $n \to \infty$. So the same is true for the right-hand side.


$^1$ Holst, Lars. "Extreme value distributions for random coupon collector and birthday problems." Extremes 4.2 (2001): 129-145.

$^2$ Gumbel, E. J. "Les valeurs extrêmes des distributions statistiques." Annales de l'institut Henri Poincaré. Vol. 5. No. 2. 1935.