Common issues with graph partitioning algorithms

32 Views Asked by At

In the Graph Representation Learning book by Hamilton (P. 24), it is mentioned that :

[...] one option to define an optimal clustering of the nodes into K clusters would be to select a partition that minimizes [the cut value $$\operatorname{cut}\left(\mathcal{A}_{1}, \ldots, \mathcal{A}_{K}\right)=\frac{1}{2} \sum_{k=1}^{K}\left|(u, v) \in \mathcal{E}: u \in \mathcal{A}_{k}, v \in \overline{\mathcal{A}}_{k}\right|$$ ]. There are efficient algorithms to solve this task, but a known problem with this approach is that it tends to simply make clusters that consist of a single node

This sounds counter-intuitive; if the algorithm selects each node as a cluster, then the number of edges would be maximal. Also, that means that the total number of nodes in the graph are equal to K, which is not necessarily true.

Is there something that I'm misunderstanding?

1

There are 1 best solutions below

2
On BEST ANSWER

Suppose we want to partition the graph below into five clusters:

enter image description here

This graph has five disjoint $4$-cliques and it seems to be a natural solution to make them be the clusters. Between any two cliques, there are only two edges, which seems like a fairly sparse connection compared to the six edges within a clique. The cut value in this case would be $2\binom 52 = 20$.

However, a less-pleasing solution that reduces the cut value would be to split one of the $4$-cliques into four clusters of one node each, and take the other $16$ nodes to be the last cluster. The cut value in this case is $14$: there are $6$ edges between the four tiny clusters, and $8$ edges from them to the huge cluster.

This is an example of the phenomenon in the quoted paragraph. Even when two large sets of vertices are relatively sparsely connected, there might be many edges between them just because they're... large sets of vertices. Taking most of the clusters to be single nodes can give a smaller cut value even if it's not really breaking up the graph in the way we want it to be broken up once we understand its structure.