How to prove that the residual sum of squares is a non-convex Function?

427 Views Asked by At

The $K$-means algorithm uses a residual sum of squares (RSS) where $$ \mbox{RSS}_{K} = \sum_{d \in s}|{d-c(s)}|^2 $$

  • $\mbox{RSS} = \sum_{k= 1}^K \mbox{RSS}_{K}$ is the convergence criterion. $\mbox{RSS}$ is the objective function of $K$-means and our goal is to minimize it.

  • $c(s) = \displaystyle\frac{1}{|s|} \sum_{d \in s}{d}$ where $d$ is the datapoint belonging to cluster $s$.


Example. Suppose there are 5 points in a cluster $(1,2),(2,1), (3,2), (2,2), (2,3)$. The mean point is (2,2). $\mbox{RSS}$ function takes that mean $(2,2)$ and each of those remaining 4 points and calculates the euclidean distance among them and sum over for all those 4 points

$$\left( ((1,2) - (2,2))^2 + ((2,1) - (2,2))^2 + ((3,2) - (2,2))^2 + ((2,3) - (2,2))^2 \right)$$


K-means algorithm is described is as following:

  1. Choose the number of cluster(Basket) $K$.

  2. Initialize $K$ centroids by randomly selecting $K$ vectors from the input

  3. Assign each vector to the closest centroid.

  4. Update each centroid by taking the average of the vectors belonging to the corresponding cluster.

  5. Repeat {Steps 3 and 4 } until convergence.


I know that functions are either convex or non-convex. For convex function, there is one global minima, but for non-convex functions there are local minima along with global minima. Hence, $K$-means easily falls in trap of those local minima. But I am not getting how that $\mbox{RSS}$ is a non-convex function. Can anyone please explain it to me?

For more information about RSS

EDIT:- As suggested by Rodrigo, In this lecture, it is stated that Lloyd’s algorithm is not guaranteed to converge to the solution of RSS. Indeed, Lloyd’s algorithm often times gets stuck in local optima of RSS.

So from here I can assume that the function must be a non convex. But how to proof it mathematically.

1

There are 1 best solutions below

2
On

Let us write down the exact definition of the $k$-means objective, as taken from Wikipedia, or the lecture notes you provided :

Given a set of observations $x_1,\ldots, x_n\in\mathbb R^d$, $k$-means clustering aims at partitioning the $n$ observations inton $k$ sets $\mathbf S =\{S_1,\ldots,S_k\} $ as to minimize the within cluster sum of squares. Formally, the objective is to find $$\mathbf S^* =\arg\min_{\{\text{all possible }\mathbf S\}}\sum_{i=1}^{k}\sum_{x\in S_i}\|x - \mu_i\|^2 \tag1$$ Where $\mu_i := \frac{1}{|S_i|}\sum_{x\in S_i}x$ is the centroid of cluster $S_i$

From this, it is clear where the non-convexity of the problem comes from : the objective function is defined over the discrete set $\mathscr S$ of "all possible partitions of $\{1,\ldots,n\}$ of cardinal $k$" (in fact, $k$-means clustering is a NP-hard problem). Indeed, if you look at the definition of a convex function, it is necessary for it to be defined over a convex set, but $\mathscr S$ is clearly not convex and so the objective function can't be either.

An other way to see that the objective function is not convex is to show that it doesn't have a unique minimum. To see this, first consider the following example : Consider the dataset of real numbers $\{0,1,2,3\} $ that you want to cluster. You see that for a $2$-means clustering, you could exchangeably take $\mathbf S = \{S_1:=\{0,1\},S_2:=\{2,3\}\}$ or $\mathbf S = \{S_1:=\{2,3\},S_2:=\{0,1\}\}$ and the objective would take the same value.
By the same reasoning, if $\mathbf S = \{S_1,\ldots,S_k\}$ is a minimizer of the objective $(1)$, then, for any permutation of $\{1,\ldots,k\} $ $\sigma$, $\mathbf{\tilde S} :=\{S_{\sigma(1)},\ldots,S_{\sigma(k)}\} $ is also a minimizer of $(1)$. Hence the objective function doesn't have a unique minimizer and is therefore not convex.

For this algorithm, "being stuck at a local minimum" means that the clusters are at a point where they will not change by iterating the algorithm one more time, even though a different configuration of the clusters could yield better values of the objective function. See the Discussion section of the Wiki article for examples of when/how that can happen.