Can one find the pairwise distances between all points from only a sparse matrix of distances between some points (using R potentially)?

420 Views Asked by At

I recently conducted a survey in a field and collected distances between different locations (not geo-referenced). These were taken in a relatively two-dimensional space (so we can assume it's flat). However, I have never done this type of study before and I'm not sure I collected the right type of data. I'm trying to figure out a way to salvage the data I do have.

What I have: Distances between random pairs of individuals (labeled as numbers; to --> from)

To From Distance(cm)
1 18 325
1 199 485
18 199 481
18 20 145
199 20 430
199 23 468
23 22 253
22 30 330
30 2 85
199 101 84
20 101 490
23 101 431
22 101 362
2 101 82
101 102 168
30 3 96
2 3 148
101 3 290
102 3 277
102 4 334
3 4 198

So from individual #1 to individual #18, it is 325 cm, etc.

Which produces a graph (although I cannot post it).

My question is: Given the distances between some of the points, is there a way to calculate pairwise, linear distances for all points?

I didn't collect any data on geo-referenced coordinates, although I believe it might be necessary to assume one in Euclidean space (i.e. assign one individual to coordinates (0,0) ?

Sadly, I do not have any angle measurements. This is actually just a sub-graph of a larger network, but which is not very big (~3x bigger). Each node has at least two measurements to other individuals, but it is not a complete matrix because I originally assumed I'd be able to back-calculate the distances (being none-the-wiser). However, now I believe I should have collected coordinates or at least some angles to trilaterate or triangulate the pairwise distances; I can't go back to get those types of measurements unfortunately. So, now I'm not sure if there is a mathematical solution to get the linear distances between points.

I've tried searching for similar problems, but have not found anyone with similar measures. The most relevant work seemed to be in geolocating but I lack a coordinate system so I cannot judge if those solutions are applicable. I just need to know if there is feasible mathematical solution and where to begin, or if there is no generalizable solution because there is not enough information (i.e. infinite solutions).

My level of comfort mathematics is not high (generally), and I haven't taken any practical application of trigonometry or geometry in 10+ years, so if someone could help provide a solution in simple terms, that would be really helpful.

Thanks!

2

There are 2 best solutions below

0
On

https://en.wikipedia.org/wiki/Cayley%E2%80%93Menger_determinant or https://en.wikipedia.org/wiki/Distance_geometry might provide some useful pointers.

Essentially you want a bunch of Cayley-Menger determinants of four points to become zero. That signifies that the tetrahedron formed by them has volume zero, meaning the points are coplanar. For $n$ points this would give you $\binom n4$ equations in the (squared) distances, and you'd need to satisfy them all to get a solution. Unfortunately the equations would be non-linear so you can't just combine them all into one big but simple linear system of equations. Instead you have several approaches at your disposal depending on the situation.

Each determinant allows you to compute the sixth distance if you already know five distances for a group of four points. Die some inputs this might be enough to allow you adding more distances one at a time until you have them all.

If not, you can build the whole big system of equations and feed them to a computer algebra system of your choice. Personally I'd use sage, and I'd also use exact arithmetic using algebraic numbers so I won't have to worry about any numeric issues along the way. If you are extremely lucky, this will yield a single solution where all distances are real numbers (i.e. discounting cases where squared distances are negative or complex). More likely you may get a finite set of such possible solutions. If you have any more information about the original points, e.g. some knowledge about the orientation of some triangles, that might help distinguish these solutions but most likely you would want to actually embed them into a coordinate space at that point.

Even more likely, the system of equations would be under- or over-determined. In the former case you don't have enough data to reconstruct all distances, although you might repeat the exercise with subsets to obtain some of the distances. In the latter case you might have conflicting data due to inaccurate measurements. You might try ignoring some of them and compute their value instead, or you might turn this whole thing into an optimization problem, looking for a solution that makes all the determinants really small even though they would not all be exactly zero. There would be a lot of choices in the details of how to approach this.

0
On

Having at least 2 measurements for each node is necessary, but not sufficient to determine the positions. Imagine that you have a vertex-pair {u,v} which cuts your graph to two components, if you delete it. Then by those two vertices you could flip one component of the graph to the other side while the distance constraints are still satisfied. By this you can see that the 3-vertex-connectivity of your graph is necessary for the reconstruction.

Unfortunately, not even 3-vertex-connectivity is sufficient. The condition is more involved, and called global rigidity link. It is NP-hard to find out if the given distances and positions represent a globally rigid framework or not, however can be determined quite easily if we consider the nodes to be in generic positions (see in the same slides).

Even if the framework is globally rigid (that is, your positions are uniquely localizable) to find the exact positions is a challenging task. The fastest and most advanced techniques use semidefinite programming link.

However, I believe that you are interested in a less involved and more practical solution. For this you can just write your loss function that gets one "possible" position of the nodes and calculates the squared errors relative to your measured distance constraints. If you fix 3 nodes in the beginning (say a triangle that you have) you can compute the position of all the other nodes. As MvG mentioned, you will need a non-convex optimizer for it. It will be slow, but as your graph is small, it should cope with it. My experience is that up to 50 nodes they solve such a problem in less then a second.

There are several nonconvex optimization possibilities for matlab, or python. There must be some for R, too.

I know my answer is late but I hope it can help some.