What is the mathematical principle that explains the shape of a curve that depicts the mean distance between points on a grid and every other point?

46 Views Asked by At

I'm wondering if someone could give me a high-school-level explanation of the mathematical principle at work behind this. I was curious about this, so I designed an experiment to test it, and I'm wondering about the results.

Suppose I have a bounded 2D coordinate grid that is a 100 x 100 square, so (0,1), (0,2) ... (99,99).

Then, for each point, I calculate the mean Euclidean distance between that point and every other point.

Then I sort the points by their mean distance to every other point in the grid and plot it.

This is what a bar plot looks like for a 100x100 grid, with the x-axis representing unique coordinates and the y-axis representing the mean Euclidean distance between that coordinate and every other point in the grid:

The plot shows a characteristic curve, where the middle section is essentially linear, the low end curves down somewhat, and the high end curves up more distinctly.

What causes this?

EDIT BASED ON @Eric's RESPONSE:

If the corners of the grid are the cause of the accelerations at each end of the plot, then wouldn't it follow that the mean Euclidean distances between points in a cornerless grid (i.e., a circle) would not have these?

I created a circle of radius 10 around the origin, generated random radii and angles to create random points within that circle, which I rounded to one place beyond the decimal. I randomly generated 500,000 such points, calculated their distances from one another, and plotted those results. Here's what it looks like:

The extremes actually become more pronounced. Was I incorrect to infer that eliminating the corners would eliminate (or reduce) the bends at the end of the plot?

1

There are 1 best solutions below

2
On

The basic idea is that points tend to be spread out mostly uniformly except for the corners which is farther from everything and the center which is closer to everything, so you’ll get a flattish curve with spikes on the ends.

It’s not particularly nice to show this exactly.

You’ll often find slightly nicer answers if you use a continuous approximation.

Using Wolfram alpha, I get that:

$$ f(x,y)=\int \int \sqrt{x^2+y^2}dx dy=1/12 ((4 x y^3)/\sqrt{x^2 + y^2} + 2 y^3 log(\sqrt{x^2 + y^2} + x) + (x^4 \sqrt{y^2/x^2 + 1}\sinh^{-1}(y/x))/\sqrt(x^2 + y^2) + (4 x^3 y)/\sqrt{x^2 + y^2} + 3 x^3 log(\sqrt{x^2 + y^2} + y) - 2 x^3 \tanh^(-1)(y/\sqrt{x^2 + y^2}) - (2 y^3)/3)$$

Then, the mean distance from $(a,b)$ to every other point in a unit square is is $$\int_0^1 \int^1_0 \sqrt{(x-a)^2+(y-b)^2}dxdy=f(1-a,1-b)-f(-a,1-b)-f(1-a,-b)+f(-a,-b))$$ This at least gives you an analytic solution.

Another tweak to the problem that happens to make it much nicer without changing the essence is to look at the average square distanc between the point and everything else. You’ll get that the average of the squares distances from a point to a collection of points equals the sum of the average of the squared distances of the center of mass to all of the points plus the distance squared of the point in question to that center of mass. In other words, you’ll get a constant plus the distance squared to the center. Since the number of points within $r$ from the center is $\pi r^2$, you’ll get that there’s a linear relationship between the number of points and the average squared distance for small distances. When the distances become large, the circle hits the edge leaving only the corners, so the density decreases.