How to determine distribution

69 Views Asked by At

I hope you will be patient with the inarticulate question of a non-mathematician. It's hard to get an answer when you don't even know how to ask the question. But here goes...

...Actually, I have two distinct but related questions. Firstly, I should like to be able to put a 'score' on how well a series of points on a line are distributed between the beginning and the end. So, if the points were clumped together rather than being well spread along the line the score would be low. And, conversely, if the points are well distributed the score would be high.

Secondly, I should also like to do the same for a series of points inside a triangle. So, how can I produce an effective measure of how well the points are distributed within the triangle?

Is there an algorithm/formula that would do this? I'm very conscious that I have given a very poor explanation of what I am looking for.

Edited

Could the Kolmogorov–Smirnov test be a good measure of how evenly points are distributed along a line?

1

There are 1 best solutions below

0
On

This is not really an answer but an extended comment in hope to point you in the right direction. I'm sure others can help more so this is a community wiki, feel free to edit.

For simplicity let's assume you always have a fixed number of points, $N$. In your first case, let's also assume without any loss of generality points are between zero and one. A "distribution" in this case is just a $N$-vector $x=[x_1,x_2,...,x_N]$ where $x_1$ is the location of the smallest point, $x_2$ is the location of the second smallest point, and so on. You are looking for an order to measure the clumpiness/concentration of two different distributions. You are likely to found many, some may better reflect what you really want than others. Some examples are:

  1. the variance (how far away from the mean they are)
  2. majorization [http://en.wikipedia.org/wiki/Majorization] (we use it in economics to measure income inequality,we call it Lorenz order)
  3. An ad-hoc idea (that also works in 2-dimensions) that probably must have already been formalized elsewhere is this: pick any small number $\varepsilon >0$ and count what is the minimum number of balls of radius $\varepsilon$ that you need to cover your set of points, $x_1,...x_N$. Call this number: $\texttt{clumps}(x,\varepsilon)$. If for other set of points, $y_1$,...,$y_N$ you have $\texttt{clumps}(x,\varepsilon)>\texttt{clumps}(y,\varepsilon)$, we say $y$ is more concentrated and $x$ is more dispersed. If there is a tie, then increase $\varepsilon$ until the tie is broken.