Kantorovich distance: discrete distributions

7.6k Views Asked by At

Let $\mu,\nu$ be two discrete distributions on $\Bbb R$, and for simplicity assume that each takes only a finite number of values. For example, $\mu$ gives probabilities $p_i$ to points $x_i$ and $\nu$ gives probabilities $q_j$ to points $x_j$. The Kantorovich distance between them can be computed as $$ \int_{-\infty}^\infty |F_\mu(t) - F_\nu(t)|\mathrm dt. $$ Is there a way to simplify this expression in my setting?

1

There are 1 best solutions below

2
On

$\newcommand{\d}{\mathrm{d}}$In fact the Wasserstein distance between two probability measures $P$ and $Q$ on a measurable space $(\Omega, \mathcal{F})$ is defined as follows (I'll give the definition of the distance of order $1$):

$$ W(P,Q) = \inf_{\mu} \left\{ \int_{\Omega\times \Omega} |x-y|\d \mu \left| \begin{array}{l} \mu: \text{ prob. measure on } (\Omega \times \Omega, \mathcal{F}\otimes \mathcal{F})\\ \text{with marginals } P, Q \end{array} \right. \right\} $$

In the definition $\Omega\times \Omega$ is the product probability space. Notice that we may extend the definition so that $P$ is a measure on a space $(\Omega, \mathcal{F})$ and $Q$ is a measure on $(\Omega', \mathcal{F}')$.

Let us now see how the above applies in the case of discrete sample spaces. For generality, let us assume that $P$ is a measure on $(\Omega, \mathcal{F})$ where $\Omega=\{\omega_i\}_{i=1}^{s}$ and $Q$ is a measure on $(\Omega', \mathcal{F}')$ where $\Omega'=\{\omega_i'\}_{i=1}^{s'}$ - here the two spaces are not required to have the same cardinality.

Then, the distance between $P$ and $Q$ becomes

$$ W(P,Q) = \inf_{\{\lambda_{i,j}\}_{i, j}} \left\{ \sum_{i=1}^{s}\sum_{j=1}^{s'}\lambda_{i,j} |\omega_i - \omega_j'|: \sum_{i=1}^{s}\lambda_{i,j} = q_j, \sum_{j=1}^{s'}\lambda_{i,j} = p_i, \lambda_{i,j}\geq 0 \right\} $$

This is a linear program, so it is easy to solve computationally.