I am trying to understand why the centroid-based clustering is NP-hard.
Assume we have one-dimensional sorted data set $x_{1}, \dots, x_{n}$ and we want to find, say $k$, optimal clusters (intervals) $\mathcal{X}_{1},\dots,\mathcal{X}_{k}$ is a sense of sum of $l_{2}$-distances to the corresponding centroids.
Since we can not solve it in a polinomial time, then, I assume, there must be a case that in the optimal set of clusters $\mathcal{X}_{1},\dots,\mathcal{X}_{k}$ there could be clusters with elements such that:
$x_{1}, x_{2} \in \mathcal{X}_{i}$, $x_{3}\in \mathcal{X}_{j}$ such that $x_{1} > x_{3}$ and $x_{3}>x_{2}$, i.e. intervals do not necessarily contain consecutive elements of the original sorted set $x_{1}, \dots, x_{n}$.
Is this correct?