Way to measure the similarity/difference of 2D point clouds

5.4k Views Asked by At

i need a way to measure the similarity or difference of two point clouds? The number of points in each point cloud can be different. The Point clouds are already aligned. By similarity I mean the similarity of the shapes.

I have already tried the following approaches:

  • principal component analysis
  • Hausdorff distance
  • mean squared distance of the points
  • mean distance of the points

All of them didn't work.

I included examples for similar and non similar point clouds.

not similar point clouds similar point clouds

3

There are 3 best solutions below

8
On

Here is one method I can think of. Since your point clouds form a curve, create a function from $[0,1]$ to $\Bbb R^2$ out of both point clouds. In essence, "connect the dots". Let's call the function of the first cloud $A$ and the function of the second cloud $B$. So basically, $A(0)$ would be the first point in cloud $A$ and $A(1)$ would be the final point in cloud $A$. $A(0.5)$ would be halfway along the curve.

To measure the similarity, compute the following integral.

$$\int_0^1\|A(x)-B(x)\|dx$$

A value of $0$ means that $A=B$, and everything greater than $0$ indicates some type of dissimilarity. Note that if two curves are somewhat similar, making them longer will increase this value, although their "similarity" should probably stay the same. To make up for this, you could divide the integral by the sum of the lengths of the curves.

5
On

I suggest a family of metrics wich may be combined (by weighted means, for example) to yield a number. Then finding a good threshold for this number (i.e. if $\mathrm{mydist}(A,B) \le \epsilon$, the clouds $A$ and $B$ are deemed similar) will complete the algorithm.

$$d_i(A,B) = \frac1{|A|+|B|} \left( \sum_{a\in A} \mathrm{dist}_i^i(a,B) + \sum_{b\in B} \mathrm{dist}_i^i(b,A) \right)$$ where $\mathrm{dist}_i$ is the usual distance from a point to a set using the $\|\cdot\|_i$-norm. The exponentiation is optional. The way it is, $d_1$ would be the average Manhattan-distances and $d_2$ would be the average euclidean distance squared.

Another option would be $$\tilde d_k (A,B) := \|f_A - f_B\|_k$$ where $f_C$ is an appropriate interpolation of the point cloud as a curve (for example using splines). This will be appropriate if you don't really care about the points themselves but rather about the curves they form. Note that this approach must take care to chose the "correct" point sequence for the curve (the order in wich the points are traversed). If your cloud has some inherent ordering (for example a measured trajectory), you can use that. If your cloud can resemble a two-dimensional shape (for example a square), this approach will not work properly. In such a case you might want to find the convex hull of the cloud first.
To deal with the "halfway" problem discussed in the comments to @Regret's answer, you can enforce $\|\nabla f_C\|_2 \equiv c$, ensuring a constant-speed traversal of the curve. Chosing $c$ appropriately such that $f_C : [0,1]\to\mathbb R^2$ will then allow for the distances $\tilde d_k$ to be well-defined and $0$ iff the interpolation curves are geometrically identical.

0
On

Some ideas that might prove useful: Assuming you've all ready applied the iterative closest point algorithm, as you mentioned in the comments, I would first fit (least squares / quadratic programming - fast fit in all cases) a smooth curve (b-spline) in each data set, say $\mathbf{C}_1(t)$ , $\mathbf{C}_2(t)$, where $t\in [0,1]$ some affine parameter. It is to be understood that each function represents a set of points. The good thing about these curves is that you can take advantage of their geometric properties (gradient, curvature, even topological properties etc). Clearly I would expect two similar set of points, respectively curves, to be similar in many levels.

You can find starting point information on b-splines and fitting here: http://en.wikipedia.org/wiki/B-spline#Curve_fitting

There exist several implementations in several programming languages for your needs.

Now there exist many interesting scalar (i.e. orientation independent) nice properties that you can use to compare your result. Some examples: Similar spatial positions: $$d_1 = \int_0^1 ||\mathbf{C}_1(t) - \mathbf{C}_2(t) || dt$$ Similar spatial positions for tangents: $$d_2 = \int_0^1 ||\dot{\mathbf{C}}_1(t) - \dot{\mathbf{C}}_2(t) || dt$$ Minimize angle between tangent vectors: $$d_3 = \int_0^1 ||\dot{\mathbf{C}}_1(t) \cdot \dot{\mathbf{C}}_2(t) || dt$$ Minimize curvature differences: $$d_4 = \int_0^1 ||\kappa_1(t) - \kappa_2(t) || dt,$$ $\kappa_i$ being the curvature of each curve etc. Minimize angle between acceleration vectors: $$d_3 = \int_0^1 ||\ddot{\mathbf{C}}_1(t) \cdot \ddot{\mathbf{C}}_2(t) || dt$$

I assume there must exist topological criteria as well (closed / open /interloping curves) that you can use.

I would not try to find a scalar that is some linear combination of these, instead I would work with a vector similarity/fitness function: $$\mathbf{f} = (d_1, \ldots, d_n)$$ and assign different numerical thresholds for each entry. In a software engineering approach, two similar curves would be identified when each entry of the fitness function is smaller than some threshold (different for each slot). The idea of not combining all the above in one scalar, comes from multi-objective optimization.

Important: I would identify the optimal thresholds for each entry from ideal theoretical models (construct points from similar curves with random noise of some order, and establish similarity criteria / threshold).

Hope it helps.