Average shortest distance between some random points in a box

2.1k Views Asked by At

Suppose there is a square box with side length $m$ (measured in pixels). Let there be $n$ points in this box, distributed uniformly within the box (with integer coordinates, aligned to a pixel grid). If we take from each point the Euclidean distance to its nearest neighbor, what would be the expected value of this distance averaged over all points?

My actual problem is about a discrete pixel grid, an $m\times m$ bitmap image, but if that's easier I would be happy with a continuous solution. A more general solution e.g. for a rectangle instead of a square box is welcome, but at this point it is not necessary. I found similar questions about the continous case, but without answers. For me it wouldn't be easy to generalise the case for two points only.

2

There are 2 best solutions below

2
On

Empirically it is between $\dfrac{0.5}{\sqrt{n-1}}$ and $\dfrac{0.56}{\sqrt{n-1}}$, tending towards the lower bound as $n$ increases. This is not very different to putting $n-1$ random points in a circle or square with area $1$ and finding the expected minimum distance to the centre.

At the bottom end with $n=2$, the question becomes the expected Euclidean distance between two random points in a square, which an earlier answer gives as $\frac{1}{15}(2+\sqrt{2}+5\log(1+\sqrt{2})) \approx 0.5214$

Simulation results look like this, both directly and multiplying by $\sqrt{n-1}$

enter image description here

with values of

n    expected dist nn     sqrt(n-1) times expected dist nn
2           0.521                   0.521
3           0.389                   0.550
4           0.322                   0.557
5           0.279                   0.558
6           0.250                   0.558
7           0.227                   0.557
8           0.210                   0.555
9           0.196                   0.553
10          0.184                   0.552
20          0.124                   0.540
50          0.075                   0.527
100         0.052                   0.519
200         0.036                   0.514
500         0.023                   0.509
1000        0.016                   0.507

(See comments) The simulation code was

set.seed(1)          # to reproduce
maxn <- 1024         # top number for n-1 neighbours 
cases <- 10^6        # simulations 
x0 <- runif(cases)   # base points  
y0 <- runif(cases)
mindistsq <- rep(Inf, cases)
emindist <- numeric(maxn)
for (i in 1:maxn){
  x <- runif(cases)  # possible neighbour
  y <- runif(cases)
  distsq <- (x0-x)^2 + (y0-y)^2
  mindistsq <- ifelse(distsq < mindistsq, distsq, mindistsq)
  emindist[i] <- mean(sqrt(mindistsq))
  }
emindistsqrtn1 <- emindist * sqrt(1:maxn)   # multiplying by sqrt(n-1)
par(mfrow=c(2,1))
plot(2:(maxn+1), emindist, xlab="number of points in square", 
                     main="Expected distance of nearest neighbour in square",
                     ylab="Expected dist nearest neighbour",
               ylim = c(0,0.6))
plot(2:(maxn+1), emindistsqrtn1 , xlab="number of points in square", 
               ylab="sqrt(n-1) times expct dist nn",
               main="sqrt(n-1) times Expected distance of nearest neighbour",
               ylim = c(0.5,0.56))
par(mfrow=c(1,1))
exn1 <- c(1:9, 19, 49, 99, 199, 499, 999)
tab <- cbind(round(emindist[exn1],3), round(emindistsqrtn1[exn1],3))  
colnames(tab) <- c("expected min","sqrt(n-1) times exp min")
rownames(tab) <- (exn1+1) 
tab
0
On

For the continuous version, assume that you pick $n$ points randomly from $[0, 1]^2$. You are trying to find $$E\left[ \frac{1}{n}\sum_{k=1}^n \min_{i, 1 \le i \le n, i \not=k}||X_k-X_i|| \right] \tag 1$$

where each of $X_i$ are randomly and uniformly chosen from $[0, 1]^2$.

Because the expected value of a sum of random variables is equal to the sum of the expected values, this simplifies to $$ \frac{1}{n}\sum_{k=1}^n E\left[\min_{i, 1 \le i \le n, i \not=k}||X_k-X_i|| \right] \tag 2$$

By symmetry, this is $$E\left[\min_{i, 1 \le i \le n-1}||X_n-X_i|| \right] \tag 3$$ Let $F_n(x) = P\left(\min_{i, 1 \le i \le n-1}||X_n-X_i|| > x\right)$, which would be $1$ for $x < 0$ and $0$ for $x > \sqrt{2}$. Then $(3)$ is $\int_0^{\sqrt{2}}F_n(x) dx$. $F_n(x)$ is equal to $$P\left(||X_n-X_1|| > x\right)^{n-1} \tag 4$$

The probability $P\left(||X_n-X_1|| > x\right)$ is given using formula $(6)$ from this paper as $\int_x^{\sqrt{2}}g(t)dt$ where $$g(x) = \begin{cases} 2x^{3}-8x^{2}+2\pi x & 0 \leq x < 1 \\ -2x^{3}-\left(2\pi+4\right)x+8x\operatorname{arccsc}\left(x\right)+8x\sqrt{x^{2}-1} & 1\leq x < \sqrt{2} \end{cases} \tag 5$$

Therefore, $P\left(||X_n-X_1|| > x\right)$ is given by $$\begin{cases} -\frac{x^{4}}{2}+\frac{8x^{3}}{3}-\pi x^{2}+1 & 0 \leq x < 1 \\ \frac{2}{3}-\frac{4}{3}\left(2x^{2}+1\right)\sqrt{x^{2}-1}+\frac{1}{2}x^{4}+\left(2+\pi\right)x^{2}-4x^{2}\operatorname{arccsc}\left(x\right) & 1 \leq x < \sqrt{2} \end{cases} \tag 6$$

The integral $\int_0^{\sqrt{2}}P\left(||X_n-X_1|| > x\right)^{n-1}dx$ is unlikely to have a closed form (or if it does, it would probably be very complicated). Instead, for an approximation, split it up as $$\int_0^{1}P\left(||X_n-X_1|| > x\right)^{n-1}dx + \int_1^{\sqrt{2}}P\left(||X_n-X_1|| > x\right)^{n-1}dx$$

The second integral has an upper bound of $\left(\frac{19}{6}-\pi\right)^{n-1}\left(\sqrt{2}-1\right)$, obtained because each value in that interval has an upper bound of $P\left(||X_n-X_1|| > 1\right)^{n-1} = \left(\frac{19}{6}-\pi\right)^{n-1}$. The first integral is given by $$\sum_{k=0}^{n-1}\sum_{m=0}^{n-k-1}\sum_{r=0}^{n-k-1}\frac{\binom{n-1}{k, m, r}}{\left(1+2k+3m+4r\right)}\left(-\pi\right)^{k}\left(\frac{8}{3}\right)^{m}\left(-\frac{1}{2}\right)^{r}$$

Numerically, this approaches $\frac{1}{2\sqrt{n-1}}$, but I'm having trouble proving it.