Why is the silhouette coefficient of a clustering between $[-1,+1]$?

100 Views Asked by At

I have been recently reading about clustering validation and came upon the silhouette coefficient, represented by the following formula.

enter image description here

Everywhere I read about this coefficient, it says that it is always between $[-1,1]$, with a value close to $1$ meaning that $x_i$ is close to points of its own cluster, a value close to $0$ meaning that $x_i$ is close to the boundary and a value close to $-1$ meaning that $x_i$ is close to another cluster.

However, what they don't explain is why the coefficient is always between this intervasl. Why does this happen? How can it be mathematically demonstrated?

1

There are 1 best solutions below

0
On BEST ANSWER

This reduces to the question "Given nonnegative numbers $u$ and $v$, at least one of them nonzero, why is $$ s = \frac{u - v}{\max(u, v)} $$ always between $-1$ and $1$?

There are a couple of ways to see this. One is to just plot the function using a tool like Desmos. That shows it's true, but not WHY it's true. It may be useful for you to do it anyhow.

Let's look at three possible cases: $u = v$, $u > v$, $u < v$.

In the first case, the numerator is zero, and the denominator is nonzero, so the quotient has the form $\frac{0}{A}$, with $A \ne 0$, and the value is zero.

Case 2: let's assume that $u > v$, or more precisely, that $u > v \ge 0$. The numerator $u- v$ is positive. The denominator is the larger of $u$ and $v$, which is $u$, so we have $$ s = \frac{u-v}{u} = \frac{u}{u} - \frac{v}{u} \le \frac{u}{u} = 1. $$ So our result is never greater than $1$. (I've used the fact that $u$ and $v$ are nonnegative, and $u > 0$, to conclude that $\frac{v}{u}$ is nonnegative, so subtracting it from $\frac{u}{u}$ results in a number smaller than $\frac{u}{u}$).

What about showing that $s \ge -1$? It's similar. We have $$ s = \frac{u}{u} - \frac{v}{u} = 1 - \frac{v}{u}. $$ How small can this number be? The larger the second term, the smaller the number is. Let's look at that second term. We know (because we're in case 2) that $u > 0$ and $v < u$. Dividing through by $u$, we get \begin{align} \frac{v}{u} &< 1\\ -\frac{v}{u} &> -1 & \text{negating an inequality reverses it!}\\ \frac{u}{u}-\frac{v}{u} &>\frac{u}{u} -1 & \text{Add the same thing to both sides}\\ \frac{u}{u}-\frac{v}{u} &>1 -1 & \text{Simplify one of the two fractions.}\\ \frac{u}{u}-\frac{v}{u} &>0. \end{align} Hence in the case where $u > v \ge 0$, we have that $0 < s \le 1$, with $s = 1$ exactly in the case where $v = 0$.

The third case, $v > u \ge 0$ is similar, except that the denominator throughout is $v$ instead of $u$; I'll let you work through that one to discover that in that case, $s$ is between $-1$ and $0$, and is $-1$ only when $u = 0$.