I'm trying to proof that the objective of the K-means clustering algorithm is non-convex.
The objective is given as $J(U,Z) = \|X-UZ\|_F^2$, with $X \in\mathbb{R}^{m\times n}, U\in \mathbb{R}^{m\times k}, \mathbb \{0,1\}^{k\times n}$. $Z$ represents an assignment matrix with a column sum of 1, i.e. $\sum_k z_{k,n} = 1$.
First, is there a easy way to see that $J$ is non-convex?
As I do not see why it is non-convex I tried to compute the hessian and ended up with: $J_{UU} = J_{ZZ} = 0$
$J_{ZU} = -2(X-U)$
$J_{UZ} = -2$
Then I see for example that the Hessian is not positive semidefinite for $X=U$.
Can anyone verify whether my argumentation is correct or provide a quicker way to see that it is non-convex?
Thanks in advance! Stefan
Since $Z$ is discrete, you cannot differentiate with respect to it. And the second derivative with respect to $U$ is not zero (the function is quadratic in $U$).
Since your objective function is not defined on a convex set, it cannot be convex. You may want to rewrite it as $J(U) = \min\limits_Z \left\| X - U Z \right\| ^2$.
To prove that it is not convex in general, it suffices to consider a special case (e.g., 4 points, in dimension 1, at coordinates $0$, $1$, $2$, $3$) and notice that there are several local minima (for $k=2$, the centers could be $.5$ and $2.5$, or $2.5$ and $.5$). In this example, you can also plot the function $J(u_1,u_2)$ (a surface), to better understand what happens.