Principal Component Analysis Summarization by the mean

146 Views Asked by At

Let’s consider $$ S=\{x_1 , . . . , x_N \} $$ with $$ x_1 , . . . , x_N ∈ R^d . $$ How can i prove that the solution of $$ argmin_{b∈ R^d} \frac{1}{N}\sum_{i=1}^{N}||x_i-b||^2 $$ is given by $$ b=\bar x=\frac{1}{N} \sum_{i=1}^{N} x_i $$ That is, the mean is the best vector summarizing/representing a sample set in the least-squares sense.

1

There are 1 best solutions below

0
On BEST ANSWER

This is a classic general result (not really related with PCA).

You only need to write

$$ \sum_i ||x_i - b||^2=\sum_i ||x_i - \overline{x} + \overline{x}- b||^2= \sum_i ||x_i-\overline{x}||^2 +n ||\overline{x}-b||^2 $$

where the last equation arises because the crossed product is zero: $\sum_i c \, (x_i - \overline{x})=0$ (where $c$ is a constant vector, and the product is a scalar product).

Then, it's clear that the sum is minimized when the second term is zero, i.e. when $b=\overline{x}$