Finding the probability density of the difference of ordered statistics

85 Views Asked by At

The question is set up as follows: Let $Y_1,Y_2,...,Y_n$ be a random sample from the distribution $Uni[-1,1]$. We are asked to find the density of $U=Y_{(n)} - Y_{(1)}$ (the n-th and 1st ordered statistic).

My initial thinking was the fact that this difference represents the range of the distribution and therefore the pdf I am looking for would represent the range of the uniform distribution from $[-1,1]$ but I could not mathematically make this work. Another thought was using the method of moments since the difference of ordered statistics would easily be represented as the division of mgf's but this also left me nowhere. Does anyone have any advice on how to approach this problem?

2

There are 2 best solutions below

1
On BEST ANSWER

Suppose that $X_1, \dotsc, X_n$ are i.i.d with common distribution function $F$. First the joint density of the extremes is given by $$ f_{X_{(1)}, X_{(n)}}(x,y)= n(n-1)(F(y)-F(x))^{n-2} f(y)f(x) I(x<y)\tag{0} $$ which can be proven by considering the following identity below for the relating distribution functions namely $$ P(X_{(n)}\leq y) = P(X_{(1)}\leq x, X_{(n)}\leq y) + P(X_{(1)}\gt x, X_{(n)}\leq y) $$ where $$ P(X_{(1)} > x, X(n)\leq y) = \prod_{i=1}^{n} P(x\lt X_i\leq y) = (F(y) - F(x))^n $$ for $y\gt x$, since $X_{(1)} > x , X(n)\leq y$ iff $x\lt X_{i}\leq {y} $ for all $i$.

From $(0)$ one can deduce that the range $R= X_{(n)} - X_{(1)}$ has density $$ f_{R}(r)=n(n-1)\int_{-\infty}^\infty (F(u+r)-F(u))^{n-2} f(u+r)f(u)\, du $$ for $r>0$. Specialize to your problem.

0
On

To approach this problem, one needs to find the joint distribution of the 1st and n-th statistics of the sample. The two quantities are not independent of each other, so we need to be careful about the derivation. We ask the question "what is the probability of the event $Y_{(1)}<t_1,Y_{(n)}<t_2$?" We can answer this question, since we know the joint probability distribution of the dataset $Y_1,..., Y_n$. Suppose that $Y_i$ is the minimum and $Y_j$ is the maximum for some $i\neq j$. Then one can show that the event probability is given by

$$P(Y_{(1)}<t_1,Y_{(n)}<t_2)=\sum_{i\neq j}P(Y_i<t_1, Y_i<Y_j<t_2, \bigcap_{k\neq\{i,j\}} Y_i<Y_{k}<Y_j)$$

With this, the event has been transformed into an expression that can be computed using the joint distribution of the dataset, which reads

$$P(Y_{(1)}<t_1,Y_{(n)}<t_2)=n(n-1)\int_{0}^{\min(t_1,t_2)}dy_1f(y_1)\int_{y_1}^{t_2}dy_2f(y_2)\left(\int_{y_1}^{y_2}f(y)dy\right)^{n-2}$$

From this one can also show that the probability density for the joint distribution of the two statistics, by taking derivatives in $t_1,t_2$:

$$\begin{align}\rho_{Y_{(1)}, Y_{(n)}}(t_1,t_2)&=\frac{\partial^2}{\partial t_1\partial t_2}P(Y_{(1)}<t_1,Y_{(n)}<t_2)\\&=n(n-1)f(t_1)f(t_2)(F(t_2)-F(t_1))^{n-2}\theta(t_2-t_1)\end{align}$$

with $f$ the PDF from which the data is drawn, and $F$ the corresponding CDF. Now it is easy to compute the distribution of the range variable $U$, since

$$\rho_U(u)=\int_{-\infty}^{\infty}dt_1\int_{t_1}^{\infty}dt_2\delta(t_1-t_2-a)\rho_{Y_{(1)}, Y_{(n)}}(t_1,t_2)=\int_{-\infty}^{\infty}dt_1\rho_{Y_{(1)}, Y_{(n)}}(t_1,t_1+u)$$

A quick calculation shows that for $f\sim Uni(-1,1)$, one gets the expression

$$\rho_{U}(u)=\frac{n(n-1)}{4}\left(\frac{u}{2}\right)^{n-2}(2-u)\theta(u)\theta(2-u)$$