metric on clustering of correlation matrix using silhouette score

338 Views Asked by At

Given a correlation matrix $A_n$, a metric $\|\cdot \|_2$ and a clustering $k$, you can calculate the silhouette score (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_samples.html). Then define a $score = silhouette.mean/silhouette.std$. If you have two different clustering's $k_1$ and $k_1$ s.t. $score(k_1)$ < $score(k_2)$ will it be true that $score(k_1) < score(k2)$ when $A_n$ is embedded inside any $A_m$ where $n<m$?

1

There are 1 best solutions below

0
On

It is easy to se that - choose 4 points on the real line such that they form 2 clusters. In $\mathbb{R}^2$ choose a $y$ and choose an element in each of the clusters. Choose $-y$ for the other 2 elements. Make $y$ sufficiently large and you get the opposite clustering.

edit - proof: Let the distance between cluster 1 and cluster 2 in $\mathbb{R}$ be $r$. Let $x_1$, $x_2$ belong to cluster 1. Let $x_3$ and $x_4$ belong to cluster 2.

In $\mathbb{R}^2$ choose $y>r$ then also $2y > r$ and choose points $x_1'=(x_1, y)$, $x_2'=(x_2,-y)$, $x_3'=(x_3, y)$ and $x_4'=(x_4, -y)$ $\implies 0 < || x_1' - x_3' || < ||x_1' - x_2' || = || x_1' - x_3' + x_3' - x_2' || \le || x_1' - x_3'|| + || x_3' - x_2' || = r + 2y \Leftrightarrow 0 < r < 2y \le r + 2y$ Also: $0 < || x_1' - x_3' || < ||x_1' - x_4' || = || x_1' - x_3' + x_3' - x_4' || \le || x_1' - x_3'|| + || x_3' - x_4' || = r + 2y $