How exactly is the prototype clustering solved using gradient descent?

102 Views Asked by Bumbble Comm At 10 May 2026 - 2:03

I don't see any derivatives.

There are 1 best solutions below

Bumbble Comm On 03 Oct 2018 - 5:01

As your slide shows, you can usually define some loss function like $$ L(C) = \sum_{c\in C} \sum_{x\in c} ||x - \mu_c|| $$ for example and then perform (stochastic) gradient descent on it.

As for $k$-means itself, people usually prefer to use Lloyd's expectation-maximization algorithm. Part of the reason for this (see also this thread) is that the classic algorithm turns out to be equivalent to Newton's method (see Bottou and Bengio (1995). Convergence properties of the k-means algorithms). Check out the work by Sculley, Web-Scale K-Means Clustering for more references.

You only want to use SGD for $k$-means when the dataset is so large as to render the EM method infeasible. Generally speaking, the results of the SGD approach are of inferior quality, but quickly obtained.

However, there may be other "prototype clustering" approaches that rely more heavily on SGD. For instance, the work by Cardot et al, 2011, A fast and recursive algorithm for clustering large datasets with $k$-medians. Maybe also look up the $k$-prototypes algorithms.

How exactly is the prototype clustering solved using gradient descent?

There are 1 best solutions below

Related Questions in CLUSTERING

Trending Questions

Popular # Hahtags

Popular Questions