I am reading the document of the class tslearn.clustering.KernelKMeans and find its source code. I have questions on the function _compute_dist from the source code which I quote as follows
def _compute_dist(self, K, dist):
"""Compute a n_samples x n_clusters distance matrix using the kernel
trick."""
sw = self.sample_weight_
for j in range(self.n_clusters):
mask = (self.labels_ == j)
if numpy.sum(mask) == 0:
raise EmptyClusterError("try smaller n_cluster or better "
"kernel parameters")
# NB: we use a normalized kernel so k(x,x) = 1 for all x
# (including the centroid)
dist[:, j] = 2 - 2 * numpy.sum(sw[mask] * K[:, mask],
axis=1) / sw[mask].sum()
Suppose $\phi$ is the feature mapping for the kernel $K$. Notice that $K(x,y)=\phi(x)^\top\phi(y)$ and we assume the normalized condition: $K(x,x)=1$ for all $x$. Now for a point $y$ and a cluster being composed of points $\{x_i\}_{i=1}^n$, the distance is computed using kernel trick as
\begin{align*} \left\|\phi(y)-\underbrace{\frac{1}{n}\sum_{i=1}^n\phi(x_i)}_{centroid}\right\|_2^2 &= K(y,y) - \frac{2}{n}\sum_{i=1}^n K(y,x_i) + \frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j)\\ &= 1 - \frac{2}{n}\sum_{i=1}^n K(y,x_i) + \frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j). \end{align*}
Comparing with the code, it seems that the term $$\frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j)$$ should be equal to $1$. But I do not think it holds. Do I miss anything? Thank you.
In kmeans, for reassignment of a point $\mathbf{y}$ to a cluster, you need to compute $d(\mathbf{y},\mathbf{c}_n)$ for the $n$th cluster. For (squared) Euclidean distance, this writes $$ d(\mathbf{y},\mathbf{c}_n) = \| \mathbf{y} \|^2 + \| \mathbf{c}_n \|^2 - 2 \mathbf{y}^T \mathbf{c}_n $$ In kernel, you compute $$ d(\mathbf{y},\mathbf{c}_n) = k(\mathbf{y},\mathbf{y}) + k(\mathbf{c}_n,\mathbf{c}_n) - 2 k(\mathbf{y},\mathbf{c}_n) = 2 - 2 k(\mathbf{y},\mathbf{c}_n) = 2- 2 \frac1N \sum_n k(\mathbf{y},\mathbf{x}_n) $$ since you are using 'normalized' kernel'.