Expected difference for two largest samples

80 Views Asked by At

I'm trying to get an understanding as to how much difference between the two largest samples drawn from some distribution $f(x)$ scales with $n$. That is, as we increase the size of our sample, how big can we expect the difference between the two largest elements we draw? Supposing it scales as $O(n^{-\alpha})$, is $\alpha$ more or less than $1$?

Suppose that $x \in [0,1]$ for simplicity (the important fact is that it is bounded).

So far I think I have (making use of basic results here):

$$ \mathbb{E}[X_{(n)}-X_{(n-1)}] = \mathbb{E}[X_{(n)}]-\mathbb{E}[X_{(n-1)}] = \int_0^1(1-F_{X_{(n)}}(x))dx- \int_0^1(1-F_{X_{(n-1)}}(x))dx $$

$$ \Rightarrow \int_0^1(F_{X_{(n-1)}}(x)-F_{X_{(n)}}(x))dx = \int_0^1 nF_X(x)^{n-1}(1-F_X(x))+F_X(x)^n-F_X(x)^n dx $$

$$ \Rightarrow n\int_0^1 F_X(x)^{n-1}(1-F_X(x))dx $$

Assuming that is right, I'm not sure where I can go from here. From what I can see, the distribution I'm looking at is roughly Gaussian bounded in a small subset of $[0,1]$, so I could in principle invoke some more assumptions on $F_X(x)$, but I wonder if there's a general result that is known/can be derived.

1

There are 1 best solutions below

0
On

This is a partial answer that shows an upper bound of $O(\log(n)/n)$ under smoothness assumptions on $F$ near $F^{-1}(1)$. I'd be interested in arguments without such an assumption. My gut says that the uniform density is probably the worst case, and a generic $O(1/n)$ upper bound should be within reach. Perhaps a calculus of variations approach?


Notice that for $u \in [0,1]$, it holds that $ ku^{k-1}(1-u) < 1/k^2$ if $u > 1-1/k^2$ or if $u < 1/k^{3/k-1} = \exp\left(-3 \log(k)/(k-1) \right).$ Since $1-x < e^{-x}$, the same also holds if $u < 1- 3\log(k)/(k-1)$.

Now let $u_n := F^{-1}(1-1/n)$ and let $v_n := F^{-1}(1 - 3\log(n)/n)$. Let $$I_n(F) := \int_0^1 nF^{n-1}(1-F) \mathrm{d}x.$$

By the above reasoning, $$ I_n \le \frac{1}{n^2} + \int_{v_n}^{u_n} nF^{n-1}(1-F).$$ The 1 in $1/n^2$ can of course be replaced by the length of the support when it is not $[0,1]$.

Note that the integrand is $O(1)$, so we don't have to worry about that too much, as long as it holds that $u_n - v_n$ is not too big. Since both $u_n$ and $v_n$ are very close to $1$, it stands to reason that we need to worry about $F^{-1}$ near $1$.

It is natural to try to check the validity of the (one-sided) Taylor expansion below: $$ F^{-1}(1-\varepsilon) = F^{-1}(1) - {(F^{-1})'(1)}{\varepsilon} + O(\varepsilon^2).$$

If the above holds, then we have $I_n \le 1/n^2 + O(u_n - v_n) = O(\log(n)/n)$ where the $O$ hides $(F^{-1})'(1)$ and other constant factors.

Since $(F^{-1})'(u) = 1/F'(F^{-1}(u)) = 1/f(F^{-1}(u)),$ where $f$ is the density, one can put forth conditions which enable something like the above - for instance, if $f$ is non-zero at $F^{-1}(1)$ (the maximum point of the support) and smooth in some right neighbourhood, then we're in business. Alternately, it need not be smooth, but must be bounded away from $0$ in some right neighbourhood of $F^{-1}(1)$ (using the Lagrange form remainder).

Under conditions such as the above, we get $I_n(F) = O(\log(n)/n).$ This is nearly tight since the uniform density induces such a 'nice' $F$ and has $I_n(\mathrm{Unif}) = 1/(n+1)$.