Find K Closest pairs in a spatial database (Without specific query object)

175 Views Asked by At

Input:

• N points {P1, …. , Pn} - each point is from the same dimension t:

Pi = {x_1, …., x_t} where k is between 18-30 . • Distance function – dist(Pi, Pj) - returns a number that is the distance between the points. (The function is a custom function – not a standard Minkowski distance).

The Problem:

• Main problem:

Find the K closest pairs from all the N points – as fast as possible. • Secondary problem:

Given a point Q = {x_1, …, x_t} return the K closest pairs. • Nice to Have:

A Database where we can add/remove a point Pi and the above queries will run as fast as possible. Relevant Data structures:

• KD-TREE

https://en.wikipedia.org/wiki/K-d_tree

http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html

• R-TREE

https://en.wikipedia.org/wiki/R-tree

http://toblerity.org/rtree/

• BALL-TREE

https://en.wikipedia.org/wiki/Ball_tree

http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html

Possible Solutions:

• Main problem:

Build a BallTree (sklearn.neighbors.KDTree).

For every point P in the BallTree find K closest pairs (Now we have N List- where every list contains the K closest pairs for every Point Pi).

Take the best K pairs from all the list above.

• Secondary problem:

Build a BallTree (sklearn.neighbors.KDTree).

Query for closest k pairs for a given point Q.

Time Complexity So far:

For every point in the tree (N in total) find K closest pairs which take O(K*log(N)) - So in total O(N * K * log(N)).

Take the best K pairs from N sorted list - can take O( Max{ K * log(K), N } ). for example with maintaining a min HEAP of size K.

The total complexity, for now, is O(N * K * log(N)) - can we do better?