How exactly is the prototype clustering solved using gradient descent?

97 Views Asked by At

How exactly is the prototype clustering solved using gradient descent?

enter image description here

I don't see any derivatives.

1

There are 1 best solutions below

0
On

As your slide shows, you can usually define some loss function like $$ L(C) = \sum_{c\in C} \sum_{x\in c} ||x - \mu_c|| $$ for example and then perform (stochastic) gradient descent on it.

As for $k$-means itself, people usually prefer to use Lloyd's expectation-maximization algorithm. Part of the reason for this (see also this thread) is that the classic algorithm turns out to be equivalent to Newton's method (see Bottou and Bengio (1995). Convergence properties of the k-means algorithms). Check out the work by Sculley, Web-Scale K-Means Clustering for more references.

You only want to use SGD for $k$-means when the dataset is so large as to render the EM method infeasible. Generally speaking, the results of the SGD approach are of inferior quality, but quickly obtained.

However, there may be other "prototype clustering" approaches that rely more heavily on SGD. For instance, the work by Cardot et al, 2011, A fast and recursive algorithm for clustering large datasets with $k$-medians. Maybe also look up the $k$-prototypes algorithms.