what are the benefits of using Spectral K-means over Simple K-means ? and how Spectral K-means overcomes the local minimum problem of K-means?

115 Views Asked by At

I have understood why K-means get stuck in local minima

Now I am curious to know how spectral k-means helps to avoid this local minima problem?

According to this paper A tutorial on Spectral,

Spectral Algorithm goes in following way

  1. Project data into $R^n$ matrix

  2. Define an Affinity matrix A , using a Gaussian Kernel K or an
    Adjacency matrix

  3. Construct the Graph Laplacian from A (i.e. decide on a normalization)

  4. Solve the Eigenvalue problem

  5. Select k eigenvectors corresponding to the k lowest (or highest) eigenvalues to define a k-dimensional subspace

  6. Form clusters in this subspace using k-means

In step 6 it is using K-means. Then how it is overcoming the local minima problem of K-means . Moreover what are the benefits of spectral over K-means

If someone gives detailed insight upon this it will be very helpful for me.

Edit:- For K-means clustering, the decision boundary for whether a data point lies in cluster $C_{1}$ or cluster $C_{2}$ is linear.

Proof of this is quite intuitive. Points on decision boundary will be equidistant from both the centers $C_{1}$ and $C_{2}$. Hence, \begin{gathered} \left\|p-C_{1}\right\|^{2}=\left\|p-C_{2}\right\|^{2} \\ \|p\|^{2}+\left\|C_{1}\right\|^{2}-2 C_{1} \cdot p=\|p\|^{2}+\left\|C_{2}\right\|^{2}-2 C_{2} \cdot p \\ 2\left(C_{1}-C_{2}\right) \cdot p+\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}=0 \\ \left(C_{1}-C_{2}\right) \cdot p+\frac{\left(\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}\right)}{2}=0 \\ W \cdot p+C=0 \\ \text { this is a hyperplane with } W=\left(C_{1}-C_{2}\right) \text { and } C=\frac{\left(\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}\right)}{2} \end{gathered} K-means cluster’s decision boundaries, leads to a Voronoi tessellation. Problem arises when data points are arranged in a non-linear way. Where linear seperation is not possible. Hence Spectral comes into the picture.


My doubt arises here , Does spectral clustering project data points to higher dimension like kernel functions and then it becomes easier to seperate the clusters?

1

There are 1 best solutions below

4
On

I think the main advantage of spectral clustering is the fact that it does not necessarily result in a Voronoi partition of the sample space.

Consider standard k-means: Once we have found our centers $\mu_1,...,\mu_k \subset \mathbb{R}^d$ (they may correspond to a local minimum of the cost function, but we might as well assume they are the true minimizer), the $i$th cluster is given by $$C_i := \Big\{ \|x - \mu_i \| < \min_{j \neq i } \|x - \mu_j\|\Big\}$$ This is a convex set (as you can easily check) and thus we have the potentially undesirable restriction that our "true" clusters will have to be associated with a Voronoi partition of $\mathbb{R}^d$ for k-means to make sense. You can find examples where k-means fails as a result of this restriction in the answers to this post.

Now to spectral clustering. I think one way to see that non-convex partitions are not an issue is to note the equivalency of spectral clustering and kernel k-means (see for example this paper). Kernel k-means is essentially ordinary k-means, but instead with respect to the feature map $\phi(x)$ instead of $x$.

If for example we use the Gaussian kernel $K(x,y) = e^{-\sigma^2\|x-y\|^2}$, then $\phi(x)$ will be an element of an infinite-dimensional space.

Kernel k-means will still result in linear separation between two clusters, but in this infinite-dimensional space. Back in the sample space, the separating affine hyperplanes will be much more complicated (non-convex) structures, as you might know from (kernel) SVMs.