Unsupervised clustering in $10$ dimensions

151 Views Asked by At

I have a set of $\sim1000$ feature vectors in $\sim10$ dimensions and would like to cluster them in an unsupervised manner. I am expecting some of the vectors to bunch together in groups, but quite a lot to be outliers that are nowhere near each other (so $\sim5$ meaningful clusters and $1$ cluster which is just a uniform distribution in all dimensions).

I'm thinking of using a Gaussian mixture model; does that sounds reasonable? Is learning a GMM suitable for this higher dimension of data or is there perhaps a more suitable technique? Does $1000$ vectors sound like enough to do $10$-dimensional clustering. I am quite new to it so am trying to get a feel. Thanks very much for any insight you might be able to provide! :)

1

There are 1 best solutions below

0
On

Your data are not "high dimensional" (1000x10 is small), but the question you are asking doesn't have a "right" answer. Depending on what you need I would suggest 2 different approaches :

Kmeans are probably the easiest out of the box algorithm in your case. The answer depends a lot on what you are trying to achieve.

By the way, your last cluster uniform along all dimensions will be hard to find in an unsupervised manner I think