Conditions for the convergence of two sorted vectors of samples

188 Views Asked by At

Let $X$ be a random variable and $X_1, X_2, \ldots, X_n$ be a sample of size $n$ and $X_{(1)},X_{(2)},\ldots,X_{(n)}$ the corresponding order statistics, which are obtained by sorting the values $X_1, X_2, \ldots, X_n$. Let's draw a new sample of size $n$ with the corresponding order statistics $X'_{(1)},X'_{(2)},\ldots,X'_{(n)}$.

It seems that the Pearson linear correlation between $X_{(1)},X_{(2)},\ldots,X_{(n)}$ and $X'_{(1)},X'_{(2)},\ldots,X'_{(n)}$ (two vectors of $n$ sorted samples from the same distribution) converges to $1$ as $n$ tends to infinity, i.e.:

$$ \frac{\sum_{i=1}^n(X_{(i)}-\bar{X})(X'_{(i)}-\bar{X'})} {\sqrt{\sum_{i=1}^n(X_{(i)}-\bar{X})^2} \sqrt{\sum_{i=1}^n(X'_{(i)}-\bar{X'})^2}} \overset{\text{a.s.}}{\to}1 $$ as $n\to\infty$ where $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_{(i)}$ and $\bar{X'}=\frac{1}{n}\sum_{i=1}^nX'_{(i)}$.

The convergence may be intuitively shown with R: n <- 1e4 ; cor(sort(rlnorm(n)), sort(rlnorm(n))) which gives results that are closer and closer to 1 as n grows.

This seems to be a simple problem, but I could not find any reference on this. Which convergence is it (almost sure, ...) and what are the conditions on the distribution for such a convergence?

2

There are 2 best solutions below

5
On BEST ANSWER

With two colleagues, we were able to prove the convergence in probability for the uniform distribution. We use the fact that it is sufficient to prove that the expected value of the correlation tends to 1 and its variance tends to 0 as $n\to\infty$ to prove the convergence in probability (this can be proved by using Chebyshev's inequality).

Continuing the proof in the previous comment, we want to study the limit of $E\left[S_n\right]=E\left[\frac1{n}\sum_{i=1}^nX_{(i)}X^\prime_{(i)}\right]$ as $n\to\infty$.

\begin{eqnarray} E\left[S_n\right] & = & \frac1{n} \sum_{i=1}^n E\left[X_{(i)}X^\prime_{(i)}\right] \\ & = & \frac1{n} \sum_{i=1}^n E\left[X_{(i)}\right] E\left[X^\prime_{(i)}\right] \end{eqnarray}

The expectation of the ordered statistics for the uniform distribution is $E\left[X_{(i)}\right]=\frac{i}{n+1}$.

Thus, $E\left[S_n\right]=\frac1{n} \sum_{i=1}^n\left(\frac{i}{n+1}\right)^2=\frac1{n(n+1)^2}\sum_{i=1}^n i^2=\frac1{6}\frac{(2n+1)}{(n+1)}$.

Therefore, $E\left[S_n\right]\to\frac{1}{3}$ as $n\to\infty$, which is indeed equal to $E\left[X^2\right]=V\left[X\right]+E\left[X\right]^2=\frac1{12}+\frac1{2}^2$ for the uniform distribution.

Now, let's consider the limit of the variance:

\begin{align*} S_n^2&=\frac{1}{n^2}\left(\sum_{i=1}^n X_{(i)}X_{(i)}^\prime\right)^2\\ &\leq \frac{1}{n^2}\left(\sum_{i=1}^n\left( \frac{1}{2}X_{(i)}^2+\frac{1}{2}{X_{(i)}^\prime}^2\right)\right)^2\\ &= \frac{1}{n^2}\left[\left(\frac{1}{2}\sum_{i=1}^n X_{(i)}^2\right)^2+\left(\frac{1}{2}\sum_{i=1}^n{X_{(i)}^\prime}^2\right)^2+2\left(\frac{1}{2}\sum_{i=1}^n X_{(i)}^2\right)\left(\frac{1}{2}\sum_{i=1}^n{X_{(i)}^\prime}^2\right)\right]\\ \end{align*}

Since $E\left[\sum_{i=1}^n{X_{(i)}^\prime}^2\right]=E\left[\sum_{i=1}^n{X_{(i)}}^2\right]$, we have

\begin{align*} E[S_n^2]&\leq \frac{1}{n^2}E\left[\left(\sum_{i=1}^nX_{(i)}^2\right)^2\right] \end{align*}

Thus,

\begin{align*} E[S_n^2]-E[S_n]^2&\leq \frac{1}{n^2}E\left[\left(\sum_{i=1}^nX_{(i)}^2\right)^2\right]-E[S_n]^2\\ &=E\left[\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)^2-E[S_n]^2\right]\\ &=E\left[\left(\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)-E[S_n]\right)\left(\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)+E[S_n]\right)\right]\\ \end{align*}

Let $A_n=\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)-E[S_n]$ and $B_n=\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)+E[S_n]$.

We have thus

\begin{equation} E[S_n^2]-E[S_n]^2\leq E[A_nB_n] \end{equation}

Since each $X_{(i)}$ is between $0$ and $1$, and $E[S_n]\leq 1/3$, we have $0\leq B_n\leq 1+1/3=4/3$. This means that

\begin{equation} E[S_n^2]-E[S_n]^2\leq \frac{4}{3}E[A_n] \end{equation}

But since

\begin{align*} E[A_n]&= E\left[\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)-E[S_n]\right]\\ &=E\left[\left(\frac{1}{n}\sum_{i=1}^nX_{(i)}^2\right)\right]-E[S_n]\\ &=\left(\frac{1}{n}\sum_{i=1}^n E\left[X_{(i)}^2\right]\right)-E[S_n]\\ &=\left(\frac{1}{n}\sum_{i=1}^n \left(V\left[X_{(i)}\right]+E[X_{(i)}]^2\right)\right)-E[S_n]\\ &=\frac{1}{n}\sum_{i=1}^n\frac{i}{(n+1)^2(n+2)}\\ &=\frac{1}{n(n+1)^2(n+2)}\sum_{i=1}^n i\\ &=\frac{1}{2(n+1)(n+2)} \end{align*}

Therefore, $V\left[S_n\right]\to0$ as $n\to\infty$, which concludes the proof.

3
On

This is not an answer, but too long for a comment.

In view of LLN, the convergence in probability $$ \frac1n \sum_{i=1}^n X^{\vphantom{'}}_{(i)}X'_{(i)} \to E[X^2],\ n\to\infty, \tag{1} $$ is equivalent to $$ \frac1n \sum_{i=1}^n\big(X^{2}_{(i)} + X'^2_{(i)} - 2X^{\vphantom{'}}_{(i)}X'_{(i)}\big) = \frac1n \sum_{i=1}^n\big(X^{\vphantom{'}}_{(i)}-X'_{(i)}\big)^2 \to 0,\ n\to \infty. $$

This is equivalent$^*$ to $$ \frac1n \sum_{i=1}^n E\big[\big(X^{\vphantom{'}}_{(i)}-X'_{(i)}\big)^2\big] \to 0,\ n\to \infty. $$ But $$ E\big[\big(X^{\vphantom{'}}_{(i)}-X'_{(i)}\big)^2\big] = V(X^{\vphantom{'}}_{(i)}-X'_{(i)}) = V(X^{\vphantom{'}}_{(i)})+V(X'_{(i)}) = 2V(X^{\vphantom{'}}_{(i)}) \\ = 2E[X^{2}_{(i)}] - 2\big(E[X^{\vphantom{'}}_{(i)}])^2. $$ Further, $$ \frac{1}{n} \sum_{i=1}^n E[X^{2}_{(i)}] = E\left[\frac1n\sum_{i=1}^n X^{2}_{(i)}\right] = E\left[\frac1n\sum_{i=1}^n X^{2}_{i}\right] = E[X^2]. $$

Therefore, $(1)$ is equivalent to $$ \frac1n \sum_{i=1}^n \big(E[X^{\vphantom{'}}_{(i)}])^2 \to E[X^2],\ n\to\infty. $$

This seems easier than $(1)$ (since it is non-random) and can be proved e.g. for bounded rv's.


$^*$ $\Leftarrow$ is obvious. For $\Rightarrow$, write, using the rearrangement inequality, $$ \frac1n \sum_{i=1}^n\big(X^{\vphantom{'}}_{(i)}-X'_{(i)}\big)^2 \le \frac1n \sum_{i=1}^n\big(X^{\vphantom{'}}_{i}-X'_{i}\big)^2 = \frac1n\sum_{i=1}^n Y_i, $$ where $Y_i$ are iid and integrable. Then it is not hard to show the uniform integrability (see e.g. here).