Standard metric for the distance between two clusters

19 Views Asked by Bumbble Comm At 11 May 2026 - 8:24

Let $A=\{A_1,A_2,\cdots,A_m\}$ and $B=\{B_1,B_2,\cdots,B_n\}$ be two sets of points in $k$-dimensional Euclidean space. Each points $A_i$ or $B_i$ can be thought of as a feature vector of a data sample. I want to know if two distribution of $A$ and $B$ are similar or not.

I can proceed univariate analysis by drawing $k$ histograms for $A$ and $B$, respectively, and view the difference of them for each $k$.

Or I can proceed like this ; that's what I'm asking for. $A$ and $B$ are two clusters of points in a Euclidean space. So I can measure the distance between this two clusters. There might be various way to define the distance, I can define by the minimal distance like

$$d(A,B)=\min_{i,j}||A_i-B_j||$$

where $||\cdot||$ is the L2 norm. Or I can define by the distance between centroid

$$d(A,B)=||C_A-C_B||$$

where

$$ \begin{align*} C_A&=\frac1m\sum_{i=1}^mA_i\\ C_B&=\frac1n\sum_{j=1}^nB_j \end{align*} $$

The former is bad since $A$ and $B$ practically overlap in some domain so that the distance is always set to nearly zero. The latter is better but have a limit ; If $A'$ has identical centroid as in $A$ but $A'$ is more scattered than $A$, then it is not desirable that $d(A',B)=d(A,B)$ ; it should be $d(A',B)<d(A,B)$.

An alternative way to estabilish the distance, I can take standard deviations of the cluster into account ;

$$d(A,B)=\frac{||C_A-C_B||}{s_As_B}$$

where $s_A$ and $s_B$ are standard deviation of $A$ and $B$, respectively.

or I can define like

$$ \begin{align*} d(A,B)&=\frac{||C_A-C_B||}{{s_A}^2{s_B}^2}\\ d(A,B)&=\frac{||C_A-C_B||}{{s_A}^2+{s_B}^2} \end{align*} $$

Is there a standard way of defining this distance?

Note 1 : I heard the word "within cluster sum of variance" in the context of K-means clustering. But it doesn't seem to involve standard deviation.

Note 2 : chat GPT recommended the last equation.

Original Q&A

Standard metric for the distance between two clusters

Related Questions in CLUSTERING

Trending Questions

Popular # Hahtags

Popular Questions