Showing that mean of vectors minimizes the sum of the squared distances.

4.1k Views Asked by At

Let $S=\{x_1,...,x_n\}$ be a set of vectors in $\mathbb{R}^d$. Now we have to pick a vector $\mu$, such that the following expression is minimized:

$$ L(\mu)=\sum_{x\in S} ||x-\mu||_2^2. $$

I think that the best option to minimize $L(\mu)$, would be to use the mean of the vectors as $\mu$. However, I can't prove why this is so.

Here's a complete answer that I solved thanks to the help of David K.

$$ \begin{align*} \mu^*&= \arg\min_\mu \sum_{x\in S} ||x-\mu||_2^2 \\ &= \arg\min_\mu \sum_{x\in S} <x-\mu,x-\mu> \\ &= \arg\min_\mu \sum_{x\in S} \big(<x,x> -2<x,\mu>+<\mu,\mu>\big) \\ &= \arg\min_\mu n<\mu,\mu>-2\sum_{x\in S}<x,\mu> \\ &= \arg\min_\mu n<\mu,\mu>-2n\left<\frac{1}{n}\sum_{x\in S}x,\mu\right> \\ &= \arg\min_\mu <\mu,\mu>-2<\overline{x},\mu> \\ &= \arg\min_\mu <\mu,\mu>-2<\overline{x},\mu> + <\overline{x},\overline{x}> \\ &= \arg\min_\mu <\mu-\overline{x},\mu-\overline{x}> \\ &= \arg\min_\mu ||\mu-\overline{x}||_2^2. \end{align*} $$ The $||\cdot ||_2$ can never be smaller than $0$. Therefore, choosing $\mu=\overline{x}$ minimizes the expression, as $||\mu-\overline{x}||_2^2$ becomes $0$. Hence, $\mu^*=\overline{x}$.

2

There are 2 best solutions below

6
On BEST ANSWER

Since the symbol $\mu$ is already in use in the question, let's write $\bar x$ to denote the mean of the vectors in $S$; that is, $$ \bar x = \frac1n \sum_{x\in S} x. $$

Then by the linearity of the inner product, \begin{align} \sum_{x\in S} \langle x, \mu\rangle &= \left\langle \sum_{x\in S} x, \mu\right\rangle \\ &= n \left\langle \frac1n\sum_{x\in S} x, \mu\right\rangle \\ &= n \left\langle \bar x, \mu\right\rangle. \\ \end{align}

With this, you can eliminate the individual $x$s from your last formula, leaving only $\mu$ and $\bar x$.

Now consider the quantity $\lVert \mu - \bar x\rVert_2^2$. That's something that clearly minimized when $\mu = \bar x$, since the norm $\lVert \cdot\rVert_2$ can never be less than zero. That is, $$ \arg\min_\mu \, \lVert \mu - \bar x\rVert_2^2 = \bar x \tag1 $$ So it would be really convenient if we could reduce your minimization problem to something that looks like the left-hand side of Equation $(1)$.

Now consider some of the techniques you already used in your first attempt. You know that $\lVert \mu - \bar x\rVert_2^2 = \lVert\mu\rVert_2^2 - 2 \langle\mu,\bar x\rangle + \lVert\bar x\rVert_2^2$, and you know you can add or subtract a constant from the value inside the $\arg\min$ without changing the $\mu$ that minimizes the value. Also notice that in your second attempt, you found that $$ \mu^* = \arg\min_\mu \, (- 2 \langle\bar x,\mu\rangle + \lVert\mu\rVert_2^2). $$

At this point, you're just a couple of steps away from showing that $\mu^* = \bar x$. (I'm trying not to spoil this too much, because it's so much fun when a problem resolves like this, especially when you get to make the final "aha!" step yourself.)

1
On

Since $S = \{x_1, ..., x_n\}$ is finite let $L(\mu) = \sum_{i = 1}^{n} ||x_i-\mu||_2^2$, which is a function from $\mathbb{R}^d$ to $\mathbb{R}$. In the following I will use upper indices to denote the respective components of the vectors involved, e.g. $x_1 = (x_1^1, x_1^2, ... x_1^d)$. Calculating the partial derivatives of $L(\mu)$ with respect to $\mu^k$ where $1 \leq k \leq d$ we get

$\frac{\partial L}{\partial \mu^k} = 2n\mu^k - 2\sum_{i=1}^{n} x_i^k$. Now, if we want this to vanish we get

$2n\mu^k - 2\sum_{i=1}^{n} x_i^k = 0 \Leftrightarrow \mu^k = 1/n \cdot \sum_{i=1}^{n} x_i^k$.

That is $\mu = \frac{1}{n} \begin{pmatrix} \sum_{i=1}^{n} x_i^1 \\...\\ \sum_{i=1}^{n} x_i^d \end{pmatrix} = \frac{1}{n} \sum_{j=1}^{n} x_j$.

Now you need to go on and prove this gives a global minima for $L$.