Motivations for Shi-Malik Algorithm

125 Views Asked by At

So I've been trying to make sense of the clustering algorithm on page 6 of this paper.

Are the "first" k eigenvalues they refer to the smallest eigenvalues?

What are the $y_i$ exactly? I don't see the motivation for using them.

If anyone could link other literature or explain how/why the first eigenvectors are being used, that would be great.

2

There are 2 best solutions below

0
On

Yes the eigenvalues are the first $k$ smallest eigenvalues, starting from the eigenvalue zero. The vectors $y_i$ are the rows of the matrix $U$ where the columns of $U$ are the eigenvectors $u_1, u_2, \ldots$ of the Laplacian.

For the symmetric Laplacian $L$, $y_i$ and $u_i$ are the same. For the non-symmetric Laplacian $L_{rw}$ people often talk about left or right eigenvectors. The author probably tried to avoid defining left eigenvectors and used $y_i$ instead.

As far as I know, the reason why the eigenvectors of the Laplacian are so good at revealing clustering and partitioning properties of complex systems is not very well understood. The intuition behind using the first eigenvectors is best understood using an analogy from physics. The eigenvectors of the Laplacian can be though of as the vibration modes of a drum skin. The eigenvector $u_1$ of the to the lowest eigenvalue corresponds to the ground state. The first eigenvector $u_1$ corresponds to the first harmonic, dividing the drum skin into half and so on.

0
On

The reasons are already explained later in (Luxburg 2007).

You can find more insight about the motives for using $\mathbf y_i$ in (Belkin et al. 2003). Basically, eigenvectors $\mathbf u_1, \dots, \mathbf u_n$ are functions that map the manifold of the graph to real lines. If $\mathbf f$ is such a function, then $\mathbf f^\top \mathbf L \mathbf f = \frac{1}{2} \sum_{i,j=1}^n \mathbf W_{ij}(\mathbf f_i - \mathbf f_j)^2$ provides an estimate of how far nearby points will be mapped by $\mathbf f$. When $\mathbf f$ is an eigenvector of $\mathbf L$, then $\mathbf f^\top \mathbf L \mathbf f$ is the eigenvalue $\lambda_i$.

Also, the distances computed from the kernel obtained by the moore-penrose pseudo-inverse of the unnormalized Laplacian $\mathbf L=\mathbf D-\mathbf W$, where $\mathbf W$ is the adjacency matrix of the graph and $(\mathbf D)_{ii}=\sum_{j=1}^n \mathbf W_{ij}$, are equal to the commute time distances in the graph.

In particular, since $\mathbf L^\dagger = \mathbf {U\Lambda^\dagger U^\top}$, where $\mathbf U=\begin{pmatrix}\mathbf u_2 & \dots & \mathbf u_n\end{pmatrix}$ and $(\Lambda)_{ii}=\lambda_i$ for $0 = \lambda_1\leq \lambda_2\leq\dots\leq\lambda_n$, you can write $\mathbf L^\dagger=(\mathbf {U\Lambda^{-1/2}})(\mathbf {U\Lambda^{-1/2}})^\top$. So, the rows of $\mathbf {U\Lambda^{-1/2}}$ are vectors featuring the nodes of your original graph in an euclidean continuous space, where distances are equal to the commute time distances in the graph.

The commute time distances are related to random walks. The commute time distance between node $v_i$ and node $v_j$ is equal to the expected number of steps required to go from $v_i$ to $v_j$ and come back to $v_i$ using a random walk (which are assumed to be markov chains). Using a random walk, you have a higher probability to stay within the same cluster, than to leave it. Hence, the commute time distances are greater for nodes within different clusters than they are for nodes within the same cluster.

We usually remove the scale on the axis $\mathbf {\Lambda^{-1/2}}$, because of another result, explain in (Luxburg 2007). Basically, the smallest $k$ eigenvectors are forming the components indicator, when the graph is disconnected into $k$ components. From here, there is some discussion about pertubation matrix theory, which states that if the perturbation is not too large, then the eigenvalues/eigenvectors should not change that much either.

And voila, you have your heuristic. Which is a loose one, don't forget that. There is no bound on how good your approximation is w.r.t. the optimal graph cut.

Notice that $\mathbf u_i \neq \mathbf y_i$ even if the Laplacian you're using is not symmetric.