Reordering rows in an array for minimum total euclidean distance

141 Views Asked by At

I am looking for an algorithm that can find the shortest path between a set of vertices in 5 dimensions given a single source.

For example if I have a 10 x 5 array, where each row corresponds to a point in 5 dimensional space. How do I reorder the rows such that travelling from top to bottom of the array results in the minimum total Euclidean distance travelled.

My end goal is to perform this operation in python.

I feel this must be a well defined problem but my research has not led me anywhere conclusive. It seems to me that a solution would incorporate graph theory - a minimum spanning tree. I'd be very grateful for any suggestions.

2

There are 2 best solutions below

1
On BEST ANSWER

This is the Euclidean version of the Traveling salesman problem. It's an NP-hard problem, so finding the optimal solution is hard when your array is very large. (For $10$ points in five dimensions, even trying all $10!$ orders is not entirely out of the question.) Techniques based on integer programming can solve larger problems.

There are various approximation algorithms. Wikipedia mentioned a method based on a minimum spanning tree that gets within a factor of $2$ of the shortest tour. There is an improvement that gets a $\frac32$-approximation, and is still not too bad to implement (the Christofides algorithm).

For the Euclidean version in a fixed number of dimensions, there is an algorithm that can guarantee arbitrarily good approximations efficiently, but I suspect that this is too complicated to bother.

8
On

we can use the concept of hierarchical clustering using single linkage. At each step, you connect two group of nodes that have minimum distance which is defined as the minimum distance that exist between a point from one group to the point from another group. Just google it and you can see its concept is what you are looking for. Python Sklearn package provides this algorithm. It creates clusters in form of chains. If you continue to completely merge all points into one cluster, you will get what you are looking for.

A very simple example: Ler's say your points have $1$ dimensuon. Data is $1, 2, 7,9, 20$

It first groups $1$ and $2$. Then, it will group $7$ and $9$. Then, it will group $1,2,7,9$ together since $2$ and $7$ has smallest single-linkage between-cluster distance. And, finally it adds 20 to the group.

At each stage of merging, just find the pair that makes the chain- i.e. leads to merging two clusters together. Doing so upto the end and you will get what you want. (You can either write the algorithem yourself which is easy, or modify sklearn hierarichal clustering function to get what you want)

EDIT So, after thinking more, I realized this algorithm doesn't work since merging two close groups doesn't necessarily occur at the end of one chain and the beginning of one chain. In other words, if $A \to B \to C$ is a preferable traversing path in one group and $D \to E \to F$ is another, merging these two chains might happen at $B \to E$. So, even rearranging the points doesn't help as it violates the idea of having the smallest distance from one point to the next.

Could anyone confirm if this is a NP-hard problem if n gets really large?