The closed expression for the barycenter in the EMD space ($W_1$ space) of 1d histograms

71 Views Asked by At

Let $\alpha$ and $\beta$ be two discrete measures $$ \alpha = \sum\limits_{i=1}^N a_i \delta_i $$ and $$ \beta = \sum\limits_{i=1}^N b_i \delta_i $$ with $\sum\limits_{i=1}^Na_i = \sum\limits_{i=1}^Nb_i =1 $ and $a_i, b_i\ge 0$.

if $\delta_i \in \{1, 2,\dots, N\}$ then, the Earth Mover Distance (the Wasserstein distance with $p=1$) have a simple closed expression

$$ EMD(\alpha, \beta) = \sum\limits_{i=1}^N\left|\sum\limits_{j=1}^ia_i - b_i\right| $$

Given a set of histograms (discrete measures), $\{\alpha_1, \dots, \alpha_K\}$ I want to find a histogram $\gamma$ which minimizes the mean of the $EMD$ distances,

$$ \gamma = \arg\min_{\gamma \in \Gamma} \frac{1}{K}\sum_{i=1}^K EMD(\gamma, \alpha_i) $$

My question it's: There is a closed expression to obtain $\gamma$ whithout solve the LP? If not, which is the easiest way to solve this problem for this specific case?

1

There are 1 best solutions below

0
On

For general $K$ and $\alpha_i$, I'd be surprised if there's an easier way to solve this than by solving the LP.

For $K=2$ I can think of an easy to compute barycenter:

$$ Barycenter_{EMD}(\alpha, \beta) = PMF\left( \frac{ CDF(\alpha) + CDF(\beta)}{2}\right)$$

Where CDF gives us the cumulative density function (interpreted as a vector) of a probability measure, and PMF expects such a cumulative density function and gives us the corresponding probability vector.

The reason this works is because the $EMD$ of two distributions corresponds to the area between the curves of their CDFs, so finding a CDF that splits this area in half gives us a solution.

Very rough sketch of example CDFs with the computed barycenter CDF inbetween:

enter image description here