The distance formula used in Kernel K-means in tslearn

73 Views Asked by At

I am reading the document of the class tslearn.clustering.KernelKMeans and find its source code. I have questions on the function _compute_dist from the source code which I quote as follows

def _compute_dist(self, K, dist):
        """Compute a n_samples x n_clusters distance matrix using the kernel
        trick."""
        sw = self.sample_weight_

        for j in range(self.n_clusters):
            mask = (self.labels_ == j)

            if numpy.sum(mask) == 0:
                raise EmptyClusterError("try smaller n_cluster or better "
                                        "kernel parameters")

            # NB: we use a normalized kernel so k(x,x) = 1 for all x
            # (including the centroid)
            dist[:, j] = 2 - 2 * numpy.sum(sw[mask] * K[:, mask],
                                           axis=1) / sw[mask].sum()

Suppose $\phi$ is the feature mapping for the kernel $K$. Notice that $K(x,y)=\phi(x)^\top\phi(y)$ and we assume the normalized condition: $K(x,x)=1$ for all $x$. Now for a point $y$ and a cluster being composed of points $\{x_i\}_{i=1}^n$, the distance is computed using kernel trick as

\begin{align*} \left\|\phi(y)-\underbrace{\frac{1}{n}\sum_{i=1}^n\phi(x_i)}_{centroid}\right\|_2^2 &= K(y,y) - \frac{2}{n}\sum_{i=1}^n K(y,x_i) + \frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j)\\ &= 1 - \frac{2}{n}\sum_{i=1}^n K(y,x_i) + \frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j). \end{align*}

Comparing with the code, it seems that the term $$\frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n K(x_i,x_j)$$ should be equal to $1$. But I do not think it holds. Do I miss anything? Thank you.

1

There are 1 best solutions below

2
On

In kmeans, for reassignment of a point $\mathbf{y}$ to a cluster, you need to compute $d(\mathbf{y},\mathbf{c}_n)$ for the $n$th cluster. For (squared) Euclidean distance, this writes $$ d(\mathbf{y},\mathbf{c}_n) = \| \mathbf{y} \|^2 + \| \mathbf{c}_n \|^2 - 2 \mathbf{y}^T \mathbf{c}_n $$ In kernel, you compute $$ d(\mathbf{y},\mathbf{c}_n) = k(\mathbf{y},\mathbf{y}) + k(\mathbf{c}_n,\mathbf{c}_n) - 2 k(\mathbf{y},\mathbf{c}_n) = 2 - 2 k(\mathbf{y},\mathbf{c}_n) = 2- 2 \frac1N \sum_n k(\mathbf{y},\mathbf{x}_n) $$ since you are using 'normalized' kernel'.