When we perform spectral clustering, given a similarity matrix $S$, we define the Laplacian matrix $L$ (normalized or unnormalized). Then, we do eigenvalue decomposition on $L$ and get its eigenvector matrix.
Why do we do eigenvalue decomposition on $L$, and not on the similarity matrix $S$?
It seems that a graph's spectrum (i.e. the eigenvalues of its Laplacian matrix) contains the structural information of a graph, but why? Why does it explain the graph structure?
Why the Laplacian?
The main idea in Spectral clustering is:
Step 2. can be reformulated as finding the minimum 'cut' of edges required to separate the graph into $k$ components.
It turns out* that minimising the value of the cut is equivalent to minimising a function of the Laplacian $L = D - W$:
$$\min_{A_1,...,A_k} \textrm{Tr}(H'LH) \textrm{ subject to } H'H = I$$
Where the columns of $H \in \mathbb{R}^{n \times k}$ are the indicator vectors of the $k$ vertex sets $A_1,...,A_k$ partitioning the graph.
Note: the solution of this is in general NP-hard given the definition of $H$, but a relaxed version where its entries can take any real value is solved by choosing $H$ as the matrix which contains the first $k$ eigenvectors of $L$ as columns.
These eigenvectors span a space containing a lower dimensional embedding of our graph, within which (due to the nature of the minimisation problem) our clusters are now easily separable.
Properties of the Laplacian
Intuition for this may be motivated by recognising the Laplacian has several useful properties absent in the adjacency matrix:
Further reading
* See Chapter 5 of the following document for a full derivation:
- A Tutorial on Spectral Clustering (2006)
For further reading, see the original papers proposing he method:
Note Fiedler himself states prior to this the Adjacency matrix (and incidence matrix) were indeed previously used to characterize graphs: