Deriving algorithm for grouping the nearest items

56 Views Asked by At

Please help me arriving at an algorithm for the below scenario.

Scenario:

A database has coins, ID and value. There are 1500 coins. Each coin is having a unique ID which start from 1 till 1500. The value will change based on the market.

Lets us say I have around 30 coins which are picked out of the 1500 coins. All are scattered widely in the 1500 available coins.

Problem

In every call to the database it returns only 100 coins from the starting position we provide. This result will have only some of the 30 coins that we have

Goal

I have to fetch all the 30 coins with minimum number of calls to the database.

Example.. Please see the coin name and the ID. In the below example I am making 9 calls to get the details of all the coins. Surely there will be a better way to reduce the number of calls by following a algorithm which can be used commonly in all cases.

enter image description here

2

There are 2 best solutions below

1
On

If the set of possible IDs is reasonably small then the following approach could allow all 30 coins to be retrieved with one database call.

I am assuming that a database call is "Give me entries $X$" where $X$ is a set of pointers (or whatever indexing scheme is used by the database). From your problem description, $X$ is allowed to have at most 100 entries at a time.

I also assume that you know the IDs of your 30 coins.

Proposed solution:

  • Let $n$ be the number of possible IDs. Let $\mathcal{L}$ be a list of size $n$ such that $\mathcal{L}(i)$ is points to the coin with ID $i$. So for example $\mathcal{L}(1160934)$ points to the coin with ID "1160934". In your situation, you will have $n \geq 1500$ and most likely $n \gg 1500$.

  • When you want to get your 30 coins, you look up $\mathcal{L}(i)$ for each ID $i$ of those coins. This will construct a set of pointers $X$.

  • Call the database with $X$. As 30 < 100, this can be done in one call.

Basically, this solution trades 'calls to database' for storage (a variant on having smaller runtime complexity at the expense of larger storage complexity).

2
On

Let's recap:

  • There is a set of $n$ unique nonnegative integers $k_i$, $i = 1 \dots n$.

  • Each call spans a fixed range $[a,\; a+L-1]$, inclusive
    (In OP's case, $L=100$.)

  • The task is to find the minimum number of spans to cover all $k_i$

This is the one-dimensional ($d=1$) geometric set cover problem, for which a simple greedy algorithm will find the solution (but only in the one-dimensional case) in polynomial time.

(The time complexity is typically $O(n^2)$ or $O(n \log n)$, depending on your sort algorithm. If there exists an integer $K$ such that $0 \le k_i \le K$ for all $n$, then a radix sort can do this in $O(n)$ time complexity. However, for small $n$ like $n = 30$, any sort algorithm, even bubble sort, will perform acceptably in practice; use the one your programming language provides.)

The greedy algorithm is easiest to implement as follows:

  1. Sort $k_i$ in ascending order, so that $k_i \lt k_{i+1}$ for $i = 1 \dots n-1$.

  2. Each span will be $[k_i ,\, k_i + L - 1]$, inclusive, for the smallest $i$ not yet covered by a span.

In pseudocode:

Function Spans(Input  ID[], 
               Input  IDs,
               Input  width,
               Output span[]):

    Sort ID in ascending order

    Let  id_min_index = 1
    Let  id_min = ID[id_min_index]
    Let  id_max = ID[IDs]
    Let  spans = 1

    # Loop over all spans except the last one.
    While (id_max - id_min > width):

        # Span covers 'width' integers
        span[spans].start = id_min
        span[spans].end = id_min + width - 1
        spans = spans + 1

        # Skip the already covered IDs.
        id_min = id_min + width
        While (ID[id_min_index] < id_min):
            id_min_index = id_min_index + 1
        End While
        id_min = ID[id_min_index]

    End While

    # Final span.
    span[spans].start = id_min
    span[spans].end = id_min + width - 1

    Return spans
End Function