Global solution for spectral clustering

255 Views Asked by At

I used spectral clustering for directed graphs suggested by Dengyong Zhou paper to partition the graph.I selected the eigen vectors corresponding to k largest eigen values and then I use kmeans or FCM to partition them. For test the implementation, I made a synthetic graph (I created randomly 4 clusters with strong connections inside them and some random weak connections between them ) the result was very well and the algorithm was correct. But for real data, I don't know the structure or pattern of the graph (the number of clusters, the diameters of the clusters, etc.).When I partition the graph into 2 clusters(by the sign of the eigenvector corresponding to second largest eigenvalue and without using kmeans or FCM) the result seems to be ok (the connections inside the cluster are strong) and it is natural because it is the global optimum, but for more than 2 I don't know how to determine if the result is good or not. Is there any optimum global solution for more than 2 clusters? any suggestion?

I also ask this question in question

Thanks

1

There are 1 best solutions below

5
On

I'll address this question from the experimental side. There might be more mathematically rigorous methods that I'm unfamiliar with.

To begin with, there's not even a universally accepted definition of what a "cluster" in a network actually is. E.g. if clusters are allowed to overlap, then the spectral clustering method wouldn't even work. So, in essence, if you want to use this in the real world, you're not going to find a "correct answer" to the problem of determining whether the clustering is good or bad.

Often the level of satisfaction of clustering methods is based on one of the following:

  • Reputation. From my experience, spectral clustering has a good reputation.

  • Consistency with other clustering methods.

  • Comparison of the results with synthetic networks with artificial clusters (you've done this).

  • Comparison of the results with real-world networks with widely accepted clusters.

Unfortunately, such real-world networks are rare. I asked this question at the computer science Stackexchange site (link). The answer there was not particularly satisfying. There was two suggestions: the karate club network and the political blog network. The data would be around in the literature (the unclustered networks would be on Mark Newman's webpage), but the clusters will need to be manually curated from the publications/data.

Curiously, the best network with regards to clustering I found was my Facebook ego graph. The nodes are my Facebook friends and edges occur whenever two of them are Facebook friends. (Note: it's an undirected graph, but these are directed graphs with each edge replaced by a bidirectional edge.) Since I'm familiar with this network, I was able to manually curate this data and identify clear-cut clusters based on aspects of my life. If you're on Facebook, you can download your ego graph using NameGenWeb.