Number of clusters in $k$-means clustering for higher-dimensional data.

195 Views Asked by Eric S. At 06 Jun 2025 - 6:18

I've read the Wikipedia article and a lot of posts on stackexchange (like this really thorough one) on determining the number of clusters in a data set. Based on that, I am currently using the silhouette analysis in MATLAB.

Clustering $2$-dimensional data (around $10^3$ points) works fine, I can determine the average silhouette value for $k=2,3,\ldots,k_{\mathrm{max}}$ for some $k_{\mathrm{max}}\in\mathbb{N}$ and of those $k$, pick the one that corresponds to the highest average silhouette value. That takes less than a minute to run.

However, with $4$-dimensional data (around $10^5$ points), this approach takes a long time. The clustering itself (using kmeans in MATLAB) is still fairly quick, but calculating the silhouette value is slow. So my thought was: perhaps one of the other methods is faster. Hence my question:

Can anyone provide insight into the performance in higher dimensions of the different methods for choosing the optimal number of clusters in $k$-means clustering?

Original Q&A

Number of clusters in $k$-means clustering for higher-dimensional data.

Related Questions in MATLAB

Related Questions in DATA-ANALYSIS

Related Questions in CLUSTERING

Related Questions in DATA-MINING

Trending Questions

Popular # Hahtags

Popular Questions