how to get geometric median in 1D by solving an optimization problem?

451 Views Asked by At

I have a set of numbers $\{x_1,x_2,\cdots\}$. The median could be obtained by sorting, but I want to get via solving an optimization problem. I know that the media minimizes $$\sum_i \left| {{x_i} - \mu } \right|$$

However, when solving the optimization problem via a solver, I get a discrepancy between the true median and obtained median. what could go wrong ?

P.S. the median could be thought of as the point that if placed between the points, the distance between it and all points is the smallest.

1

There are 1 best solutions below

0
On BEST ANSWER

It's true that the median has this property.

If the number of points is odd, then the median is the only number with this property. If you're getting a discrepancy there, then either the solver is inaccurate, or there's a problem with your input.

However, if you have an even number of points, there could be many equally good solutions. Suppose the points are in increasing order: $x_1 \le x_2 \le \dots \le x_{2n}$. Then for any point $\mu \in [x_n, x_{n+1}]$, we have $$ \sum_{i=1}^{2n} |x_i - \mu| = \sum_{i=1}^n (\mu - x_i) + \sum_{i=n+1}^{2n} (x_i - \mu) = \sum_{i=n+1}^{2n} x_i - \sum_{i=1}^n x_i. $$ There are $n$ terms where $\mu$ is positive and $n$ terms where $\mu$ is negative, so they cancel, and there is no dependency on $\mu$: all points in this range are equally good!

In fact, all points in the range $[x_n, x_{n+1}]$ will be the optimal solutions. (If $\mu > x_{n+1}$, then there are more $+\mu$ terms than $-\mu$ terms, so we decrease the sum by decreasing $\mu$. If $\mu < x_n$, then there are more $-\mu$ terms than $+\mu$ terms, so we decrease the sum by increasing $\mu$.)

The median is usually defined as $\frac{x_n + x_{n+1}}{2}$ in this case, but an LP solver is more likely to give $x_n$ or $x_{n+1}$ as the optimal solution.