Distribution of a sequence of maximums generated using i.i.d. Normal variables

80 Views Asked by At

I am trying to think about the distribution of a random process. Here's how you would generate the sequence: for each sample of size k (sampled from i.i.d. Normal R.V.s), we find the maximum, and let's call it $M_{1}$. We can repeat this procedure and generate a sequence of maximums: $M_{1}, M_{2}, ..., M_{n}$. This sequence of $n$ values is independent and can be assumed to be identical, considering the underlying generation process is the same. I tried to derive an analytical expression for this and it leads to this. This is basically the distribution of any $M_{i} \ \forall i \in {1, 2, 3..., n}$.

My question is this: what happens to the mean and variance of this distribution as $n \rightarrow \infty$ and the sample size, $k$, is varied?

Law of Large Numbers tells us that these would converge to the Mean and Variance of $M_{i}$ and the underlying distribution is Normal (by CLT). I am stuck at trying to find the mean and variance of $M_{i}$. It's possible to find an analytical expression for the pdf of $M_{i}$, for any general $k$. But, the expressions look difficult to integrate. I tried integrating the pdf to find mean and variance but reached nowhere. A google search revealed that there is a theorem called Extreme Value Theorem EVT, which applies to the case when $k \rightarrow \infty$. Not sure if this specifically applies here.

Also, I tried running some simulations in R and can see that the mean and variance do indeed converge to different values as $k$ is varied. Can someone help in deriving an expression for the mean and variance, or any other insights are much appreciated? I did notice something interesting: as $k$ is increased the mean increases whereas the variance decreases.

1

There are 1 best solutions below

3
On BEST ANSWER

We can certainly create a table for small $k$, numerically integrating for the mean and variance when the underlying distribution is standard normal; then we can exploit location-scale transformations to get the moments when the underlying distribution is normal with arbitrary mean and variance. Specifically, if $$X_i = \mu + \sigma Z_i \sim \operatorname{Normal}(\mu, \sigma^2)$$ with $Z_i$ standard normal, and $$M_i(k) = \max_{i=1}^k X_i = \mu + \sigma \max_{i=1}^k Z_i = \mu + \sigma M_i^*(k),$$ then computing the mean and variance of $M_i^*(k)$ of the maximum order statistic of the standard normal distribution will suffice. It is not too difficult to compute these to high precision. A table is provided as follows up to $k = 40$: $$\begin{array}{c|cc} k & \operatorname{E}[M_i^*(k)] & \operatorname{Var}[M_i^*(k)] \\ \hline 1 & 0\hphantom{.0000000000000000000} & 1\hphantom{.00000000000000000000} \\ 2 & 0.5641895835477562869 & 0.68169011381620932846 \\ 3 & 0.8462843753216344304 & 0.55946720379736701380 \\ 4 & 1.0293753730039641321 & 0.49171523687474176068 \\ 5 & 1.1629644736405196128 & 0.44753406902066198877 \\ 6 & 1.2672063606114712976 & 0.41592710898324811918 \\ 7 & 1.3521783756069043992 & 0.39191777612675045282 \\ 8 & 1.4236003060452777531 & 0.37289714328672899422 \\ 9 & 1.4850131622092370063 & 0.35735332635781334373 \\ 10 & 1.5387527308351728560 & 0.34434382326069025507 \\ 11 & 1.5864363519080001689 & 0.33324744270295743512 \\ 12 & 1.6292276398719129903 & 0.32363638704764511498 \\ 13 & 1.6679901770491274980 & 0.31520538421231131148 \\ 14 & 1.7033815540999765215 & 0.30773010247051352042 \\ 15 & 1.7359134449410374337 & 0.30104157031389397523 \\ 16 & 1.7659913930547879673 & 0.29500980901031979788 \\ 17 & 1.7939419808826908735 & 0.28953300368769581952 \\ 18 & 1.8200318789687221046 & 0.28453012974137323777 \\ 19 & 1.8444815116038246581 & 0.27993580492832891811 \\ 20 & 1.8674750597983204847 & 0.27569661561853123249 \\ 21 & 1.8891679149213104844 & 0.27176844368099078145 \\ 22 & 1.9096923216814163261 & 0.26811448752380604676 \\ 23 & 1.9291617116425034366 & 0.26470377412772997713 \\ 24 & 1.9476740742256781348 & 0.26151002449149128630 \\ 25 & 1.9653146097535565808 & 0.25851077750621494386 \\ 26 & 1.9821578397613119821 & 0.25568670553246791801 \\ 27 & 1.9982693020065785915 & 0.25302107405446189268 \\ 28 & 2.0137069241232659490 & 0.25049931092298106079 \\ 29 & 2.0285221460475933143 & 0.24810865987769637268 \\ 30 & 2.0427608441715109743 & 0.24583789954688620362 \\ 31 & 2.0564640976381941372 & 0.24367711379799326984 \\ 32 & 2.0696688279289069449 & 0.24161750271345842095 \\ 33 & 2.0824083359701366048 & 0.23965122596881073012 \\ 34 & 2.0947127557684849500 & 0.23777127225118112783 \\ 35 & 2.1066094396039525939 & 0.23597134975445983004 \\ 36 & 2.1181232867564915367 & 0.23424579384730181654 \\ 37 & 2.1292770253732226709 & 0.23258948882088842374 \\ 38 & 2.1400914552352043060 & 0.23099780124849819693 \\ 39 & 2.1505856577287634253 & 0.22946652297472534804 \\ 40 & 2.1607771781750199583 & 0.22799182213242611444 \\ \end{array}$$ Unfortunately, I am not aware of a general closed-form solution for each $k$. We can attempt to fit these, e.g., $$\operatorname{E}[M_i^*(k)] \approx -0.059204467433884 \log ^2 k + 0.79407613941480 \log k + 0.026795590426391, \\ \operatorname{Var}[M_i^*(k)] \approx -0.45226384311138 k^{-2} + 1.23294245728553 k^{-1} + 0.21144333738729,$$ but this is not particularly illuminating. For large $k$, it may be better to do use some other theorems.