Why do the "important" eigenvectors of a graph Laplacian have small-magnitude eigenvalues?

245 Views Asked by At

In spectral clustering, one computes sample-sample similarities, then from this computes a graph Laplacian matrix. (Typically, one uses the symmetrically normalized Laplacian matrix, but the pattern I'm asking about seems to hold in general.) Next, one computes the eigendecomposition of the graph Laplacian. Notably, the eigenvectors that are "important" have small eigenvalues, as seen in the plot below, from A Tutorial on Spectral Clustering (von Luxburg, 2007). In the plot below, data is generated from a mixture of 4 Gaussians, which correspond to the first four eigenvalues.

eigenvalues of graph Laplacian

Furthermore, it can be shown that the number of connected components corresponds to the multiplicity of the eigenvalue 0. In other words, each connected component has an eigenvector with an associated eigenvalue of 0.

Intuitively, this is a bit surprising. For example, in principal component analysis (PCA), the components with the largest eigenvalues explain most of the variance. Why is it different for the graph Laplacian?

1

There are 1 best solutions below

0
On BEST ANSWER

The negative of the graph Laplacian, $-L$, is the generator of a continuous time Markov chain, where you leave each node at a rate proportional to its degree, choosing a neighboring node uniformly at random. Given an initial distribution $p$ as a row vector, the distribution at time $t$ is $p e^{-Lt}$. Now the small eigenvalues of $L$ correspond to the (relatively) large eigenvalues of $e^{-Lt}$.

There is no probability current into or out of a connected component in a random walk on an undirected graph, so one eigenvector of the whole chain is going to be the invariant distribution concentrated on that connected component, which has eigenvalue $1$ in $e^{-Lt}$ and thus eigenvalue $0$ in $L$. The small but nonzero eigenvalues of $L$ correspond to slow probability currents, e.g. between two sets of vertices, both "large", which are connected by some sort of "chokepoint".

There is an analogous line of thinking for discrete time, but then you need to use the random walk normalized Laplacian instead in order to make the comparison quantitative.

tl;dr version: it is because $x \mapsto e^{-x}$ is a decreasing function.