Expected maximum of a Zipf sample

96 Views Asked by At

A Zipf random variable $X\sim Z(\alpha,D)$ takes values in ${1,\ldots, D}$ such that a value $x$ is selected with probability proportional to $x^{-\alpha}$.

More precisely, $\Pr[X=x]=\dfrac{x^{-\alpha}}{\sum_{i=1}^{D}i^{-\alpha}}$.

Assume you have a set $\{X_i\}_{i=1}^{N}$ of i.i.d. random variables, each is distributed $Z(\alpha, D)$.

What is $\mathbb E \left[\max\{X_i\}_{i=1}^N \right]$? (as a function of $\alpha, D$, and $N$).

If it is hard to compute, can we approximate it?


What if $D=\infty$ (and $\alpha>1$)?

1

There are 1 best solutions below

3
On BEST ANSWER
  1. Consider $\alpha> 1$. We'll go far enough in the distribution so that $nP(X>j)$ gets small which will imply that it's unlikely that there are two or more deviations above $j$ and hence we can approximate $P(M>j)\approx nP(X>j)$ for large $j$ and deal with smaller $j$'s directly.

We'll use

$$E(M)=\sum_{i\ge 0}{(1-(1-\bar F(i))^n)}\tag 1$$

where $\bar F=1-F_X$. The following is a good approximation for non-tiny $j$'s:

$$\bar F(j)\approx\int_{j+1/2}^{D+1/2}x^{-\alpha} \, dx=\frac 1 C\frac 1 {\alpha -1}\left(\frac 1 {(j+1/2)^{\alpha -1}}-\frac 1 {(D+1/2)^{\alpha -1}}\right) \tag 2$$ where $$C=\sum_{i=1}^Di^{-\alpha}\approx\frac 1 2+\frac 1 2D^{-\alpha}+\int_1^Dx^{-\alpha}\,dx=\frac 1 2+\frac 1 {\alpha-1}(1-D^{1-\alpha})$$ is a constant of proportionality in the pdf. As soon as $n\bar F(j)$ becomes small (say $<\frac 1 {\lambda}$ for $j>L$ and set $\lambda=10$ for orders of magnitude), we can expand $(1)$ linearly:

$$E(M)\approx \sum_{i=0}^L(1-(1-\bar F(i))^n)+\sum_{i=L+1}^D n\bar F(i)$$

where we can further approximate the second sum by an integral using $(2)$. $(2)$ also tells us how to pick $L$. We need to consider several cases - depending on how small $\alpha-1$ is. If ${n\lambda}$ is order of magnitude less than ${D^{\alpha-1}}$, it's enough to pick $L=(n\lambda)^{\frac 1 {\alpha-1}}$ which is especially nice (sublinear in $n$) if $\alpha>2$. I'll finish the analysis for smaller but not too small $\alpha-1$ a bit later. $\alpha-1$ so small that $D^{\alpha-1}\approx 1$ can be approximated by the case $\alpha = 1$ considered below.

Summarizing - compute the first few terms in $(1)$ exactly, then use approximation $(2)$ while $i\le L$ and approximate the rest by another integral that can be computed exactly. This doesn't work for infinite $D$ unless $\alpha>2$, since otherwise $E(X)=\infty$.

  1. For $\alpha=1$, we get $$C\approx \frac 1 2+\log D$$ $E(M_n)\ge E(X)=\frac DC\approx \frac D {\log D}$ which is intractible in your case as far as I understand.

  2. Case $\alpha<1$ is even worse. Integral approximation to discrete sums yields

$$F(j)\approx (\frac j D)^{1-\alpha}$$

and hence, by doing another integral approximation to discrete sums in $(1)$ yields:

$$E(M)\approx D\Big(1-\frac 1 {1+n(1-\alpha)}\Big)$$

linear in $D$.