I have understood why K-means get stuck in local minima
Now I am curious to know how spectral k-means helps to avoid this local minima problem?
According to this paper A tutorial on Spectral,
Spectral Algorithm goes in following way
Project data into $R^n$ matrix
Define an Affinity matrix A , using a Gaussian Kernel K or an
Adjacency matrixConstruct the Graph Laplacian from A (i.e. decide on a normalization)
Solve the Eigenvalue problem
Select k eigenvectors corresponding to the k lowest (or highest) eigenvalues to define a k-dimensional subspace
Form clusters in this subspace using k-means
In step 6 it is using K-means. Then how it is overcoming the local minima problem of K-means . Moreover what are the benefits of spectral over K-means
If someone gives detailed insight upon this it will be very helpful for me.
Edit:- For K-means clustering, the decision boundary for whether a data point lies in cluster $C_{1}$ or cluster $C_{2}$ is linear.
Proof of this is quite intuitive. Points on decision boundary will be equidistant from both the centers $C_{1}$ and $C_{2}$. Hence, \begin{gathered} \left\|p-C_{1}\right\|^{2}=\left\|p-C_{2}\right\|^{2} \\ \|p\|^{2}+\left\|C_{1}\right\|^{2}-2 C_{1} \cdot p=\|p\|^{2}+\left\|C_{2}\right\|^{2}-2 C_{2} \cdot p \\ 2\left(C_{1}-C_{2}\right) \cdot p+\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}=0 \\ \left(C_{1}-C_{2}\right) \cdot p+\frac{\left(\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}\right)}{2}=0 \\ W \cdot p+C=0 \\ \text { this is a hyperplane with } W=\left(C_{1}-C_{2}\right) \text { and } C=\frac{\left(\left\|C_{2}\right\|^{2}-\left\|C_{1}\right\|^{2}\right)}{2} \end{gathered} K-means cluster’s decision boundaries, leads to a Voronoi tessellation. Problem arises when data points are arranged in a non-linear way. Where linear seperation is not possible. Hence Spectral comes into the picture.
My doubt arises here , Does spectral clustering project data points to higher dimension like kernel functions and then it becomes easier to seperate the clusters?
I think the main advantage of spectral clustering is the fact that it does not necessarily result in a Voronoi partition of the sample space.
Consider standard k-means: Once we have found our centers $\mu_1,...,\mu_k \subset \mathbb{R}^d$ (they may correspond to a local minimum of the cost function, but we might as well assume they are the true minimizer), the $i$th cluster is given by $$C_i := \Big\{ \|x - \mu_i \| < \min_{j \neq i } \|x - \mu_j\|\Big\}$$ This is a convex set (as you can easily check) and thus we have the potentially undesirable restriction that our "true" clusters will have to be associated with a Voronoi partition of $\mathbb{R}^d$ for k-means to make sense. You can find examples where k-means fails as a result of this restriction in the answers to this post.
Now to spectral clustering. I think one way to see that non-convex partitions are not an issue is to note the equivalency of spectral clustering and kernel k-means (see for example this paper). Kernel k-means is essentially ordinary k-means, but instead with respect to the feature map $\phi(x)$ instead of $x$.
If for example we use the Gaussian kernel $K(x,y) = e^{-\sigma^2\|x-y\|^2}$, then $\phi(x)$ will be an element of an infinite-dimensional space.
Kernel k-means will still result in linear separation between two clusters, but in this infinite-dimensional space. Back in the sample space, the separating affine hyperplanes will be much more complicated (non-convex) structures, as you might know from (kernel) SVMs.