Is there an algorithm for finding the approximate nearest neighbour from an extremely large number of vectors?

37 Views Asked by At

Let's say I have some vector $n$ of length 10, and I also have a bag (i.e. multiset) of 1000 numbers, $m$:

$$n = [0.2, 0.3, 0.0, 0.5, 0.8, 0.9, 0.2, 0.6, 1.0, 0.4]$$

$$m = \{0.1, 0.3, 0.7, 0.3, 0.5, 0.9, 1.0, 0.3, ...\}$$

How can I construct a vector $l$ from the elements of $m$ such that the distance between $l$ and $n$ is minimised? Elements are taken from $m$ without replacement. The specific distance metric doesn't really matter — Euclidean, Manhattan, I'm not fussy.

Obviously, the number of permutations here is extremely large, so brute-forcing isn't an option. Is there an approximate method for finding this vector, even if it's something only moderately better than choosing randomly?

1

There are 1 best solutions below

0
On

For the Manhattan distance, you can think of this an an Assignment problem: assigning a member of the $m$ to each index of your vector $n$, where the cost for assigning $m_j$ to index $i$ is $|m_j - n_i|$. Or for Euclidean distance use the cost $|m_j - n_i|^2$. Efficient algorithms are well-known.