The Largest Gap

136 Views Asked by At

If I have an X amount of randomly generated positive numbers, what type of algorithm could I run to find the following:

  • Precisely where the largest difference exists between two of the numbers?

  • How much is that difference?

  • Where does the smallest gap exist?

  • How much is that difference?

  • Generate a list of all differences and how many occurrences there are of each difference

It has been a while since I have taken any sort of math class, so pardon the potentially inaccurate tags.

Ideally I'm looking for a way to generate a list of all the differences, and then place those differences into a descending order. Once the list of all the differences into a descending order is created, we still need to be able to identify where in the original set of numbers, each particular difference takes place.

1

There are 1 best solutions below

1
On

If you want a list of all differences, then two sortings are called for: the first sort orders the random sample of size $X$, and the second sort orders the resulting $X-1$ differences of consecutive values (thereby grouping equal "gaps" together to facilitate counting their frequencies).

Let $k_1,k_2,\ldots,k_X$ be the original sample of positive integers, possibly with repetitions. After sorting these, which can be done with $O(X \ln X)$ complexity, we have a relabelled/ordered sample:

$$ m_1 \le m_2 \le \ldots \le m_X $$

Now we can take the consecutive differences, giving us $X-1$ gaps also with possible repetitions, $g_i = m_{i+1} - m_i$ for $i=1,\ldots,X-1$.

The second sort, sorting these $g_i$, gives you all the information about the largest, the smallest, and most frequent (mode) gap.


To be able to trace a particular gap back to the original items which produce that difference requires keeping some bookkeeping information with the arrays as we sort them. This is largely a programming implementation issue, but we can sketch the idea.

Suppose the original sample of five positive integers $k_i$ were numbered as "key-value" pairs (i,k):

(1,6) (2,5) (3,2) (4,11) (5,7) <---- k (unsorted)

Then sorting on the second "value" coordinate rearranges these into:

(3,2) (2,5) (1,6) (5,7) (4,11) <---- m (sorted)

We will then be able to recover the original location of such values from their first "key" coordinate (an index into the original sample sequence).

To continue the example, we construct the corresponding four gaps $g_i$ using the rearranged/sorted locations:

(1,3) (2,1) (3,1) (4,4) <---- g unsorted

The second sort uses the "gap" values and carries the new location-of-gap information along with it:

(2,1) (3,1) (1,3) (4,4) <---- g sorted

From this we see the smallest gap is 1, that this occurs twice, and that the largest gap is 4 (occurring only once).

If we want to know where that largest gap came from, we use the first coordinate of (4,4) to determine that it was the difference between the fourth and fifth sorted samples, $g_4 = m_5 - m_4$. Then we use the earlier array of pairs representing $m_i$ to find that $m_5$ was originally $k_4$ and $m_4$ was $k_5$.

Thus by adding these "keys" to the key-value pairs, we are able to tell which locations of the original sample produced the largest gap (or any other gap, as listed in the final differences).