Distribution of N smallest uniformly distributed random variables

Question

Distribution of N smallest uniformly distributed random variables

101 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:13

What I want to achieve?

Let's say, I have a set of $N$ samples of a uniformly distributed random variable. For simplicity, they fall into the range $[0,1)$. If I picked a random point $a \in N$ and a set of $K\subset N\backslash \{a\}$ other points in that interval, what is the probability that a given set $K$ holds all the closest points to point $a$?

In my concrete case I'm interested in $|K| = 20$

What (I think) I found out

The closest points to $a$ follow beta distributions (order statistics)
The probability that we find $x$ peers in a certain interval follows the poisson distribution

Idea

In the case of $|K| = 20$. I thought I could take the average distance to $a$ of all $K$ keys and compare it to the beta distribution of the 10.5th closest key.

I did the following simulation to familiarize myself:

import numpy as np

ROUNDS = 10_000
N = 15_000
nodes = np.random.uniform(0, 1, size=(ROUNDS,N))

nodes.sort()

averages = []
for n in nodes:
    averages += [np.average(n[:20])]

I plot the CDF of the averages together with the beta distribution with $\alpha=k$ and $\beta=N-k+1$ (according to Wikipedia).

This gives me the following graph:

My idea was that if I have a set of $K$ keys, I calculate their average distance to $a$ and just look up the corresponding CDF value. For example, if the average distance is $0.001$ the probability that the set contains the 20 closest keys is 1-CDF. In that case ~10%.

However, as you can see above, the distributions are similar but not the same.

Some Context

In the context of the Kademlia Distributed Hash Table there is a process that tries to find the $k=20$ closest nodes to a certain key $a$ according to the XOR metric. We can assume that node identities in such a hash table are uniformly distributed and live in the same key space as $a$. The process to find the closest nodes iteratively yields ever closer nodes and eventually returns the set of the 20 closest ones. However, this process can take quite long and I want to check if the process can be terminated earlier by calculating the probability that the set of 20 closest nodes I know (in the middle of the process) is already the closest one we can get.

Questions

What would be the correct distribution there?
Is my line of thought correct?
Is there a simpler method to achieve my goal?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-08-03 21:00:22

Average distant doesn't matter that much: of you by luck got one element very close to $a$, this doesn't mean you can relax requirements on next.

Note that $K$ consists of points closest to $a$ iff the $\max_{x \in K} d(x, a) < \min_{x \in N \setminus K} d(x, a)$.

In other words, probability of $K$ been the correct set is exactly probability of rest $N \setminus K$ points been further away from $a$ than $\max_{x \in K} d(x, a)$.

In uniform case, it's simply $(1 - 2 \max_{x \in K} d(x, a))^{N - K}$ (assuming this $\max$ is less than $a$ and $1 - a$, otherwise modify correspondingly).

Distribution of N smallest uniformly distributed random variables

What I want to achieve?

What (I think) I found out

Idea

Some Context

Questions

There are 1 best solutions below

Related Questions in UNIFORM-DISTRIBUTION

Related Questions in POISSON-DISTRIBUTION

Related Questions in ORDER-STATISTICS

Related Questions in BETA-FUNCTION

Trending Questions

Popular # Hahtags

Popular Questions