How to see that K-means objective is non-convex?

Question

How to see that K-means objective is non-convex?

12.9k Views Asked by Bumbble Comm At 28 Mar 2026 - 3:26

I'm trying to proof that the objective of the K-means clustering algorithm is non-convex.

The objective is given as $J(U,Z) = \|X-UZ\|_F^2$, with $X \in\mathbb{R}^{m\times n}, U\in \mathbb{R}^{m\times k}, \mathbb \{0,1\}^{k\times n}$. $Z$ represents an assignment matrix with a column sum of 1, i.e. $\sum_k z_{k,n} = 1$.

First, is there a easy way to see that $J$ is non-convex?

As I do not see why it is non-convex I tried to compute the hessian and ended up with: $J_{UU} = J_{ZZ} = 0$

$J_{ZU} = -2(X-U)$

$J_{UZ} = -2$

Then I see for example that the Hessian is not positive semidefinite for $X=U$.

Can anyone verify whether my argumentation is correct or provide a quicker way to see that it is non-convex?

Thanks in advance! Stefan

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2013-08-11 14:54:41

Since $Z$ is discrete, you cannot differentiate with respect to it. And the second derivative with respect to $U$ is not zero (the function is quadratic in $U$).

Since your objective function is not defined on a convex set, it cannot be convex. You may want to rewrite it as $J(U) = \min\limits_Z \left\| X - U Z \right\| ^2$.

To prove that it is not convex in general, it suffices to consider a special case (e.g., 4 points, in dimension 1, at coordinates $0$, $1$, $2$, $3$) and notice that there are several local minima (for $k=2$, the centers could be $.5$ and $2.5$, or $2.5$ and $.5$). In this example, you can also plot the function $J(u_1,u_2)$ (a surface), to better understand what happens.

**Bumbble Comm** · Answer 2 · 2017-09-29 01:49:14

I happen to be learning $k$-means these days. "Not convex" you mean. Putting notations aside, what $k$-means does is basically,

1) Given a $k$-partition of the data set, find the $k$ mean values of each subset in the partition;

2) Given the $k$ mean values, redefine the partition by putting each data point into the subset with the closest mean to it,

and repeat until a convergent result is achieved. Yes, in terms of the partition, the problem is discrete and one cannot apply the convex optimization theory to it. You're trying to introduce an assignment matrix to enlarge the feasible space of discrete partitions to continuous assignment weights. A more straightforward view is in terms of the $k$ mean values, which are continuous to begin with. Here comes the notation part.

Let $\vec{x}_i, i=1,2,...,n$ be the data points and $\vec{\mu}_j, j=1,2,...,k$ be the $k$ mean values. Then rule 2) allows us to formulate the minimization problem as

$$\mbox{minimize }\,\sum_{i=1}^n\min_{j=1..k}\Vert\vec{x}_i-\vec{\mu}_j\Vert^2,$$

with no constraints on the mean values $\vec{\mu}_j$. The problem resembles the NP-hard $k$-center problem (though the maximum is replaced by a summation) in graph theory. Since the minimum of convex functions is in general not convex (the maximum is, though), the method of $k$-means is also non-convex in general. Think about $k=2$ for example, and assume our data points $\vec{x}_i$ are in 1D space. Then one can see that each term

$$\min(|x_i-\mu_1|^2,|x_i-\mu_2|^2)$$

in terms of the two mean values $\mu_1$ and $\mu_2$ already has a corrugated shape by plotting for example, $z=\min(x^2,y^2)$ in 3D. The graph looks like a piece of corrugated cloth hung at its 4 corners. Then what $k$-means does is minimize the sum of diagonally shifted images of such corrugated functions. The result depends on a lucky intial guess.

To find a convex clustering algorithm, please check support vector clustering. The basic idea is to delinate a smooth boundary around the data points. As usual one needs to specify a length scale for the Gaussian kernel (these come from the basics of support vector machine and kernel trick) to define how smooth one wishes the boundary to be. Then if the data points cluster, the boundary naturally falls apart into disconnected components. One can then identify each component as a cluster. The number of clusters can be adjusted by tuning the length scale of the Gaussian kernel. I think the idea might first come from the one-class SVM initially intended for outlier (anomaly) detection. Then people noticed that the boundary need not be connected and identified its potential use in clustering problems.

How to see that K-means objective is non-convex?

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in DATA-ANALYSIS

Related Questions in CLUSTERING

Trending Questions

Popular # Hahtags

Popular Questions