Joint pdf of the two highest values extracted from a uniform distribution

207 Views Asked by At

An observable $x$ follows a uniform distribution U([0,m]), where $m$ is a known parameter. Given $N$ independent observation, we retain only the two highest values among the N observations (let's call them $x_1$ and $x_2$). My goal is to determine the joint probability distribution $P(x_1,x_2)$ of $x_1$ and $x_2$. I am aware that according to the definition, I can express this joint probability as \begin{equation} P(x_1,x_2)=P(x_1)P(x_2|x_1)=P(x_2)P(x_1|x_2), \end{equation} where

  • $P(x_1)=\frac{N}{m^N}x_1^{N-1}$ is the probability of extracting the highest value,
  • $P(x_2)=\frac{N(N-1)}{m^N}x_2^{N-2}(m-x_2)$ is the probability of extracting the second highest value,
  • $P(x_1|x_2)$ or $P(x_2|x_1)$ are the conditional probabilities.

How can I determine these two conditional probabilities? Alternatively, is there a more effective method to calculate this joint probability?

3

There are 3 best solutions below

2
On BEST ANSWER

Firstly, one should properly formulate the problem. Let $Y_{1},...,Y_{N}$ be iid uniform$[0,m]$ random variables.

Let $X_{1},...,X_{N}$ be the $Y_{i}$'s arranged in ascending order. (These are called the ordered statistics btw).

Firstly notice that the joint pdf of $Y1,...,Y_{N}$ is given by $f(y_{1},...,y_{N})=\displaystyle\frac{1}{m^{N}}\,,y_{i}\in[0,m]$

But see that for each permutation of $y_{1},...,y_{N}$ say $(y_{\sigma(1)},...,y_{\sigma_{N}})$ if $Y_{1}=y_{\sigma(1)},...Y_{N}=y_{\sigma(N)}$, then too you have that $X_{1}=y_{\sigma_{0}(1)},...,X_{N}=y_{\sigma_{0}(N)}$ where $\sigma_{0}$ is the unique permutation which arranges $y_{1},...y_{N}$ in ascending order.

Hence, you get that the joint pdf of $X_{1},...,X_{N}$ is given by

$f_{(X_{1},...,X_{N})}(y_{1},...,y_{N})=\dfrac{N!}{m^{N}}\,,y_{1}<y_{2}<...<y_{N}$

Another way to see this is to see that $f_{X_{1},...,X_{N}}(x_{1},...,x_{N})$ will be non zero only when $x_{1}<...<x_{n}$. But notice that the volume of this region is just $\frac{m^{N}}{N!}$. So you should scale the pdf by $\frac{N!}{m^{N}}$ to obtain the pdf in this region.

Anyways, to return to the problem,to find $f_{X_{N-1},X_{N}}(y_{N-1},y_{N})$, you need to integrate over the region $y_{1}<...<y_{N-2}<y_{N-1}$

i.e.

\begin{align}f_{X_{N-1},X_{N}}(y_{N-1},y_{N})&=\int_{0}^{y_{N-1}}\int_{y_{1}}^{y_{N-1}}\int_{y_{2}}^{y_{N-1}}\cdots\int_{y_{N-3}}^{y_{N-1}}\frac{N!}{m^{N}}\,dy_{N-2}dy_{N-3}...dy_{1}\\ &=\frac{N!}{m^{N}}\cdot \frac{y_{N-1}^{N-2}}{(N-2)!}\end{align}

which is just $f_{X_{N-1},X_{N}}(y_{N-1},y_{N})=\frac{N(N-1)y_{N-1}^{N-2}}{m^{N}}\,,y_{N-1}<y_{N}$

To put it in a more readable notation,

$$f_{X_{N-1},X_{N}}(x,y)=\frac{N(N-1)x^{N-2}}{m^{N}}\,,0<x<y<m$$

3
On

The Gandalfian solution strategy is good. It had a minor error at the end that is now fixed. Here is another way to do it:

Fix $m>0$ and $n\in \{2, 3, 4, ...\}$. Let $\{W_1, ..., W_n\}$ be i.i.d. $Unif[0,m]$. Let $Y=\max[W_1,...,W_n]$ and let $X$ be the second largest of the $W_i$ values.

Fix $x,y$ such that $0<x<y<m$. For sufficiently small $\delta>0$ we have that $[x, x+\delta]$ and $[y, y+\delta]$ are disjoint intervals contained in $[0,m]$. Then

$$P[W_1 \in [x, x+\delta], W_2 \in [y, y+\delta], W_i< x \quad \forall i\geq 3] = (\delta/m)^2 (x/m)^{n-2}$$

The above associates index 1 with the second max and index 2 with the max. However, there are $n(n-1)$ ways for this situation to happen (pick an index from the set $\{1, ..., n\}$ to associate with the second max, pick one of the remaining $n-1$ indices to associate with the max). So the PDF $f_{X,Y}(x,y)$ satisfies $$ f_{X,Y}(x,y)\delta^2 \approx n(n-1)(\delta/m)^2 (x/m)^{n-2}$$ Dividing by $\delta^2$ and taking $\delta\rightarrow 0$ gives $$ \boxed{f_{X,Y}(x,y) = \frac{n(n-1)}{m^n}x^{n-2}\quad \mbox{ for $0< x<y< m$}}$$ and of course we define the PDF to be zero when $(x,y)$ does not satisfy $0<x<y<m$.



You can verify that for all $m>0$ and all $n\in\{2, 3, 4, ...\}$ we have $$ \int_0^m\int_{x}^mf_{X,Y}(x,y)dydx =1$$ If you want to change the PDF support to $(x,y)$ that satisfy $0\leq x\leq y\leq m$ you can, it is also a valid PDF and this minor change will not change any integration.

1
On

Another solution which I think is easier (using $b$ for the highest value and $a$ for the second highest):

The probability the $n$ values uniformly distributed in $[0,m]$ are all $b$ or less is $\left(\frac b m\right)^n$ so the density for the highest of these is the derivative of that, namely $$\frac{nb^{n-1}}{m^n}$$ when $0 \le b \le m$.

Conditioned on the highest being $b$, the probability the remaining $n-1$ values uniformly distributed in $[0,b]$ are all $a$ or less is $\left(\frac a b\right)^{n-1}$ so the conditional density for the highest of these remaining is the derivative of that, namely $$\frac{(n-1)a^{n-2}}{b^{n-1}}$$ when $0 \le a \le b$.

So the joint density is the product, namely $\frac{nb^{n-1}}{m^n} \frac{(n-1)a^{n-2}}{b^{n-1}}$ i.e. $$\frac{n(n-1)a^{n-2}}{m^n}$$ when $0 \le a \le b \le m$, as the other solutions have found.