What is the relationship between average distance between 2 random points in a cube, with min and max?

3.5k Views Asked by At

I wrote a computer program to simulate the minimum, average, and maximum distance between $2$ random points in a unit length cube (such as $1$ cubic foot). The minimum possible distance is $0$ and the maximum possible distance is about $1.73$ (Square root of $3$), but what I am wondering is why is the average distance somewhere around $0.66$? I am trying to figure out what is the relationship between that average distance and the max. For example, if we have points $(0,0,0)$ and $(1,1,1)$, that is a distance of about $1.73$, the maximum possible. If we instead have $(0,0,0)$, $(.5, .5, .5)$. That is a distance of about $0.866$.

All I am looking for here is some explanation of why the average distance is about $0.66$ units and what the relationship is to the max. A fair question is why is the average distance less than half of the max distance when it seems the average distance between each coordinate component (x,y,z) should be around $0.5$? I suppose if you assume the smaller of (x1,x2) is already more than $0.5$, then the max x difference will be less than $0.5$ so I assume it is related to the probability that one of more of the 3 dimensional point coordinates has a lower value already greater than $0.5$ (half of the unit size of the cube).

Perhaps this can be approximated by using probability that the lower value of all of x,y,z will be greater than or less than $0.5$ and also coming up with some type of average when only $1$ dimension changes and then extrapolating that to all $3$ dimensions. I was a little surprised when I got about $0.66$. I will get a good average, using $1$ billion pairs of points.

3

There are 3 best solutions below

2
On BEST ANSWER

Firstly note that "random" is meaningless without specifying a distribution. I'll assume that you mean "uniformly random".

It seems the average distance between each coordinate component (x,y,z) shouold be around 0.5.

Why should this be? If you choose two uniformly random points $p,q$ from $[0,1]$, if $p = 0$ then indeed the average distance between them is $\frac12$, but if $p = \frac12$ then the average distance is only $\frac14$. In general, the average distance given $p$ is $\int_0^1 |q-p|\ dq = \frac12(p^2+(1-p)^2)$, and hence the overall average distance is $\int_0^1 \frac12(p^2+(1-p)^2)\ dp = [\frac16(p^3-(1-p)^3)]_0^1 = \frac13$.

As for the original problem, you want average distance, but that cannot be computed from the average distance in each coordinate, because you want Euclidean distance and not rectilinear distance. Note that $\frac1{\sqrt{3}}(a+b+c) \le \sqrt{a^2+b^2+c^2} \le a+b+c$ for any non-negative reals $a,b,c$, so we know that the average distance is at least $\frac1{\sqrt{3}} \approx 0.577$ and is at most $1$, since expectation is linear. This bound isn't tight, of course. The right-hand inequality is particularly loose.

The exact answer is:

$\int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \sqrt{(p_1-q_1)^2+(p_2-q_2)^2+(p_3-q_3)^2}\ dp_1\ dp_2\ dp_3\ dq_1\ dq_2\ dq_3$.

Which Jack has already provided a numerical value for.

1
On

The average distance in a unit cube is given by the multiple integral $$ I=\int_{[0,1]^6}\sqrt{(x-X)^2+(y-Y)^2+(z-Z)^2}\,d\mu $$ whose numerical value is around $\color{red}{0.6617}$. You simulations are correct.

We may also notice that the average value of $(x-X)^2$ is $\frac{1}{6}$, hence the inequality $$ I \leq \sqrt{\frac{1}{6}+\frac{1}{6}+\frac{1}{6}}=\frac{1}{\sqrt{2}} $$ holds by convexity. By using tails approximations it can be checked that it is not terribly crude, so the average distance is expected to be closer to $0.7$ than to $0.5$ also without explicit computations.

3
On

In one dimension, the minimum distance is $0$ and the maximum is $1$. The average is less than $\frac 12$ because it is hard to get large distances. One point needs to be near one end of the interval and the other needs to be near the other. If we call the points $x,y$, the average distance is $$\int_0^1 \int_0^1 |x-y| \;dx \;dy=2\int_0^1 \int _y^1 x-y \; dx \; dy\\ =2\int_0^1(\frac {x^2}2-xy)|_y^1\;dy\\=2\int_0^1(\frac {1-y^2}2-y+y^2)dy\\=2(\frac y2+\frac {y^3}6-\frac {y^2}2)|_0^1\\=\frac 13$$ where the first equality comes from integrating only over the triangle where $x\ge y$ and doubling the result by symmetry. The same effect makes the average in three dimensions less than the midpoint of $0$ and the maximum.