Measuring Distinctness of Points on a Line

22 Views Asked by At

I have a line with endpoints $0$ and $100$, containing a set $S$ of $k$ points. I know nothing in advance of what the values of $S$ will be except that they will sum to $100k/2$.

I am trying to find (or create) a measure of "distinctness" for the set of points so that I can numerically compare it to other sets of $k$ points on the same line.

I've been struggling to come up with a good definition of distinctness, but I'm hoping the concept is intuitive with examples.

  1. $S$ is maximally distinct (i.e. its distinctness value is 100%) if the $k$ points are equally spaced and span the entire line (i.e. at points $0, 100/(k-1), ... 100(k-2)/(k-1), 100$).

  2. $S$ is minimally distinct (i.e. its distinctness value is 0%) if all $k$ points share the same value (i.e. 50).

What I am trying to do here is calculate the separation of the points. I have tried using the average of the distances between adjacent points, but that treats $S=[0,40,100]$ the same as $S=[0,50,100]$ and the latter should be more distinct. I have tried measuring the minimum distance between two adjacent points, but then that can't distinguish between $S=[0,10,15]$ and $S=[0,20,25]$, and again the latter should be more distinct.

I suspect the answer lies in some combination of the average of the distances between adjacent points and the standard deviation (or variance) of the same, but I can't figure out a formula that makes sense.

I don't know if this is a genuinely new problem or if I just don't know the right keyword to search, but advice would be appreciated.

1

There are 1 best solutions below

0
On

You can use distances between adjacent points, but you need to apply a concave function to them first. For instance, you might take $\sum_{i=1}^{k-1} \sqrt{d_i}$, where $d_i$ is the distance between the $i$th and $(i+1)$-st point. This will attain a maximum when points are equally spaced between the smallest point and largest point. It also is strictly increasing when you increase the distance between any two adjacent points. This is far from the only solution, though. There are more subtle questions, like should $[0,10,20]$ or $[0,19,21]$ be more distinct?