Best vector distance measure for contrasting clustering?

Question

Best vector distance measure for contrasting clustering?

149 Views Asked by Bumbble Comm At 10 May 2026 - 12:53

Let say i have V-vectors, everyone with size N, which measure of distance should I use if i want to cluster them by the following criteria : the more elements vector has in common with other vectors of the cluster and the fewer elements in common with members of the contrasting clusters.

Ideally the number of clusters are dynamic and the vectors are added one-by one. But I don't think this have impact on the distance measure.

I found Tversky contrast distance ! It seems to be a Set distance metric, whatever that means. On the other hand the formula looks nice.

d(x,y) = a * f(x intersect y) - b * f(x-y) - c * f(y-x)

in binary :

d(x,y) = a * count(x&y) - b * count(x&~y) - c * count(~x&y)

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-06-26 03:23:08

It's not clear if your "vector" is an unordered set of $N$ discrete objects, but that's what it seems like. You can simply define a distance metric e.g. like: $$ d(x,y) = \left[\gamma+ \sum_i \sum_j L[x_i,y_j] \right]^{-2} $$ where $\gamma\in\mathbb{R}^+$ and $$ L[x_i,y_j] = \begin{cases} 1 &\text{if} \;\;x_i = y_j\\ 0 & \text{otherwise} \end{cases} $$ Or if you view each $x$ as a set, you can use something like: $$ \tilde{d}(x,y) = \left( \gamma + \frac{1}{N}\sum_k {1}_{x_k\in y} \right)^{-1} $$

Then run pretty much any clustering algorithm using this metric.

As for the clustering being dynamic, you basically have to use a clustering algorithm that does not take the number of clusters as inputs. Algorithms like mean shift, affinity propagation, dbscan, or a Bayesian Gaussian mixture with a Dirichlet prior. You can get coordinates, if desired, by using the distance matrix $D$, where $D_{ij} = d(x_i,x_j)$, with an embedding algorithm like multidimensional scaling.

These two questions ([1] and [2]) are related. Some nice papers I saw that can do dynamic estimation of the number of clusters with online updates are:

Sequential clustering with particle filtering - Estimating the number of clusters from data by Schubert & Sidenbladh
Revisiting k-means: New Algorithms via Bayesian Nonparametrics by Kulis & Jordan
Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture by Campbell et al
Streaming Clustering with Bayesian Nonparametric Models by Huynh & Phung

Best vector distance measure for contrasting clustering?

There are 1 best solutions below

Related Questions in CLUSTERING

Trending Questions

Popular # Hahtags

Popular Questions