A convergence in probability associated to order statistics

772 Views Asked by At

Consider $n$ independent random variables $U_1,\dots,U_n$ uniformly distributed on $[0,1]$. Order them like so : $$U_{(1)}\leq U_{(2)}\leq\cdots\leq U_{(n)}$$

Let $x>0$ be given. How does one prove the following assertion : $$J_n=\frac1n\sum_{k=1}^n\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}}\xrightarrow[n~\to~\infty]{~\text{in probability}~}e^{-x}~?$$ (we set $U_{(0)}=0$).


What I have tried / observations :

  • Since we are trying to establish convergence in probability to a constant, it suffices to show convergence in distribution (see wikipedia)
  • We can thus try to show either (a) pointwise convergence of the characteristic functions $\Phi_n=\Phi_{J_n}$ to $\Phi_{e^{-x}}$, or (b) pointwise convergence of the cumulative distribution functions $F_n=F_{J_n}$ to $F_{e^{-x}}$ at all points of continuity of the cumulative distribution function of the constant random variable $e^{-x}$, (that is, on $\mathbb{R}\setminus\{e^{-x}\}$)
  • The $J_n$ only take on finitely many values that are non negative integers, and thus the $F_n$ only take on discrete values.
  • The problem asks one to establish the fact that $$U_{(1)}\leq U_{(2)}\leq\cdots\leq U_{(n)})\sim\Big(\frac{T_{1}}{T_{n+1}},\dots,\frac{T_{n}}{T_{n+1}}\Big)$$ with $T_k=\mathscr{E}_1+\cdots+\mathscr{E}_k$ with the $\mathscr{E}_i$, $i=1,\dots,n+1$ iid of exponential law of parameter $1$, aswell as the fact that $$nU_{(1)}\xrightarrow[n~\to~\infty]{~\text{in distribution}~}\mathrm{Exp. Law}(1)$$
  • Another observation is the fact that convergence in probability is equivalent to convergence of $\mathbb{E}[|J_n-e^{-x}|\wedge 1]\to 0$, that is, since $-1\leq J_n-e^{-x}\leq +1$, $J_n\to e^{-x}$ in $L^1$.

The problem I don't know how to get rid of is fact that the $n(U_{(k)}-U_{(k-1)}$ aren't independent ... The facts I quote in the last bullet point entail that each $\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}}$ has the same expectation and they tend to $=e^{-x}$ as $n\to\infty$. Unfortunately the law of large numbers cannot be applied (and in any case, we are only asked to show convergence in probability anyways).

2

There are 2 best solutions below

1
On

You can find covariances of summands and show that they are (at least asymptotically) negative, and then use Chebyshev inequality. $$ \mathbb P\left(|J_n-e^{-x}|>2\varepsilon\right)\leq \mathbb P\left(|J_n-\mathbb E[J_n]|>\varepsilon\right)+\mathbb P\left(|\mathbb E[J_n]-e^{-x}|>\varepsilon\right) $$ The second term is negligible, and the first one is bounded by $$\mathbb P\left(|J_n-\mathbb E[J_n]|>\varepsilon\right)\leq \dfrac{\text{Var}(J_n)}{\varepsilon^2} = \dfrac{\text{Var}\left(\sum_{k=1}^n\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}}\right)}{n^2\varepsilon^2} $$ Next, $$ \text{Var}\left(\sum_{k=1}^n\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}}\right) = n\text{Var}\left(\mathbf{1}_{\{nU_{(1)}>x\}}\right) + n(n-1)\text{Cov}\left(\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}},\mathbf{1}_{\{n(U_{(j)}-U_{(j-1)})>x\}}\right) $$ It seems too hard for me to find covariance but one simply can bound it from above: for $n>x$, $$ \mathbb P\left(n(U_{(k)}-U_{(k-1)})>x,\ n(U_{(j)}-U_{(j-1)})>x\right) = \mathbb P\left(\mathscr{E}_k - \frac{x}{n-x}\mathscr{E}_j > \frac{x}{n-x}T_{n+1}^{j,k},\;\mathscr{E}_j - \frac{x}{n-x}\mathscr{E}_k > \frac{x}{n-x}T_{n+1}^{j,k}\right):=P_1, $$ where $T_{n+1}^{j,k}=T_{n+1}-\mathscr{E}_j-\mathscr{E}_k$. The last probability is less than $$ P_1\leq \mathbb P\left(\mathscr{E}_k > \frac{x}{n-x}T_{n+1}^{j,k},\;\mathscr{E}_j - \frac{x}{n-x}\mathscr{E}_k > \frac{x}{n-x}T_{n+1}^{j,k}\right):=P_2 $$ and this probability can easily be found: $$ P_2=\int\limits_0^\infty \frac{y^{n-2}}{(n-2)!}e^{-y} \;\mathbb P\left(\mathscr{E}_k>y, \mathscr{E}_j > \frac{x}{n-x}(y+\mathscr{E}_k)\right)\,dy. $$ I got $$ P_2=\frac1{2^{n-1}}\left(1-\frac{x}n\right)^n $$ So, $$ \text{Cov}\left(\mathbf{1}_{\{n(U_{(k)}-U_{(k-1)})>x\}},\mathbf{1}_{\{n(U_{(j)}-U_{(j-1)})>x\}}\right)\leq \frac1{2^{n-1}}\left(1-\frac{x}n\right)^n - \left(1-\frac{x}{n}\right)^{2n} $$ It is asymptotically strictly negative, and convergence in probability follows from Chebyshev inequality.

1
On

Instead of studying the order statistics let us have a look on the lengths of the intervals between the order statistics: let $$ L_i=U_{(i)}- U_{(i-1)} \qquad \text{for } 1\leq i\leq n+1,$$ where we use the convention that $U_{(0)}=0$ and $U_{(n+1)}=1$. The random vector $(L_1,\dots,L_{n+1})$ has the uniform distribution on the simplex $$ \left\{ (x_1,\dots,x_{n+1}) : x_1+\cdots+x_{n+1}=1, \quad x_1,\dots,x_{n+1}\geq 0 \right\}. $$

An alternative way to generate a random vector with this probability distribution is to consider the vector $$ \left( \frac{Z_1}{Z_1+\cdots+Z_{n+1}}, \dots, \frac{Z_{n+1}}{Z_1+\cdots+Z_{n+1}} \right), $$ where $Z_1,\dots,Z_{n+1}$ are independent random variables with the exponential distribution $\operatorname{Exp}(1)$. (TODO: missing reference for the proof of this result.)

Our goal is to study the probability distribution of the number of indexes $k\in\{1,\dots,n\}$ with the property that $n L_k > x$ or, equivalently, the number $N$ of the indexes such that $$ Z_k > x \cdot \frac{Z_1+\cdots+Z_{n+1}}{n}.$$ By Bienaymé-Chebyshev inequality the right-hand side converges in probability to $x$.

Take $\epsilon>0$ and denote by $N'$ the number of the indexes such that $$ Z_k > x +\epsilon.$$ We have $$ \mathbb{P}\left\{ N < N' \right\} \leq \mathbb{P}\left\{ x \cdot \frac{Z_1+\cdots+Z_{n+1}}{n} > x+\epsilon \right\} \xrightarrow{n\to\infty} 0.$$

Since the random events counted in $N'$ are independent, the law of large numbers implies that $$ \frac{N'}{n} \xrightarrow{P} e^{-x-\epsilon}. $$

The union bound implies that $$ \mathbb{P}\left\{ \frac{N}{n} < e^{-x -\epsilon} - \epsilon \right\} \leq \mathbb{P}\left\{ N < N'\right\} + \mathbb{P}\left\{ \frac{N'}{n} < e^{-x -\epsilon} - \epsilon \right\} $$ converges to zero for each $\epsilon>0$.

Analogously one can show the upper bound that the probability of the event $$ \frac{N}{n} > e^{-x +\epsilon} + \epsilon $$ converges to zero for each $\epsilon>0$.