I keep encountering the term "convex cluster" which I cannot understand what it means. I am exploring different types of clustering methods and in the description sections some mention advantages/drawbacks related to the convex properties of the resulting clusters. A clear explanation will be much appreciated.
What defines a convex Cluster and how it differentiates from other types?
2.7k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
I felt I could give a little more detail as I read about this recently.
Common algorithms such as $k$-means are very simple but, as you say, the major drawback is that the clusters it forms are often convex. What does this mean and why is it true?
Let's remind ourselves how the algorithm works. It starts with a fixed value of $k$ (the number of clusters) and typically randomly splits the points to be clustered amongst the various clusters $C_1, C_2,..., C_k$. We then iterate the following steps:
- We work out the "average point" in each cluster $C_i$ to be the point $p_i$ (literally it averages each coordinate between each point in the cluster), which is referred to as the centroid of that candidate cluster.
- New clusters are formed by assigning a point to $C_i$ iff it is closer to $p_i$ than to any other cluster's centroid.
- Go to first step.
The idea is that after a large number of iterations this "self-correction" process will make nice, neat clusters. Assuming the clustering process does indeed large converge to some stable clusters, it means that we are dividing the space of points into a Voronoi tessellation.
Convexity means, recall, that if two points are in a cluster then the straight line drawn between them also lies entirely within that cluster - you'll find an explanation on the Wikipedia page as to why Voronoi tesselations are always convex. Their explanation is this:
Given any two centroids, we may draw a line/plane/hyperplane (depending on the dimensionality of the space) between them separating space between the points closer to one or the other. These are called half-spaces and are convex.
The intersections of convex sets in space are also convex.
A given Voronoi cell, i.e. the set of points in space closest to a given centroid $p_i$, is the intersection of all the half-spaces relating $p_i$ with all other centroids (all this is saying is that the points closest to $p_i$ are those which are closer to $p_i$ than $p_1$ or $p_2$ or $p_3$ etc.) which are all convex.
Thus Voronoi cells are convex. Since clusters formed by $k$-means are Voronoi cells, $k$-means creates convex clusters.
The intuition here is that convex sets are round, i.e. circles and spheres and ellipses and such are all convex. Bendy shapes like a boomerang, the letter "N" or an annulus are all classic examples of nonconvex sets. Why is this a problem? Well, basically, there's nothing more we can say beyond the fact that the clusters you are looking for might not be convex. This is a fact that we have to deal with as humans: although convex shapes are nice to think about and maybe the "classic cluster" in our minds mind look convex, there's no reason to suspect real world data will be like this.
A great example expounding upon a clustering technique I'm fond of, called spectral clustering, can be found here and gives a good example. They have the input population be this set of annuli:
This dataset has 3 clear clusters - the concentric circles. What does $k$-means do if we run it with $k=3$?
$k$-means forms convex clusters and completely misses what we, as humans, see very clearly. What does spectral clustering split the data into?
Brilliant! Why does this work so well? The long answer is rather complex, relating to the brilliant idea behind Spectral Clustering which is to notice that the distribution of the eigenvalues of the adjacency matrix of a graph is strongly related to the distribution of clusters within the graph (this is a whole field of study called spectral graph theory).
The short answer is that this clustering technique relies upon deep structural mathematical results which prove a deep connection between what the algorithm examines and clusters in the data, whereas $k$-means simply is an algorithm created because it "intuitively should work" and just often... well... doesn't!
There are various attempts at getting around this, such as using hierarchical $k$-means but in my opinion, these attempts are trying to fix something whose underlying mechanics are still flawed. The thing to take from this is that while all clustering methods will give some result, think a little about the assumptions about the type of clusters your algorithm makes before applying it on any old dataset!
Basically, in a convex cluster, you can draw a straight line from any point in the cluster to any other point in the cluster without leaving the cluster. For example, a U-shaped cluster would not be convex because you could not draw a straight line from one end of the U to the other without leaving the cluster and crossing across empty space.
Convex and non-convex example I drew
Wikipedia page that will explain in more detail