Coupling methods

704 Views Asked by At

Distance between probability measures

Let $(X,d)$ be a compact metric space, and let $\mu$ and $\nu$ be two probability measures on $X$. We can define the Wasserstein distance between $\mu$ and $\nu$ in the following way.

Let $\mathcal{P}_{\mu, \nu}$ be a the set of probability measures on $X \times X$ whose first marginal is $\mu$ and second marginal is $\nu$. Then:

$$W_1 (\mu, \nu) = \min_{\pi \in \mathcal{P}_{\mu, \nu}} \int_{X \times X} d(x,y) \pi (dx,dy).$$

Distance between compact spaces

Let $(X,d)$ and $(X',d')$ be two compact metric spaces. We can define the Gromov-Hausdorff distance between $\mu$ and $\nu$ in the following way.

Let $\mathcal{D}_{d, d'}$ be a the set of distances on $X \coprod X'$ whose restriction to $X \times X$ is $d$ and whose restriction to $X' \times X'$ is $d'$. Then:

$$d_{GH} ((X,d), (X',d')) = \inf_{D \in \mathcal{D}_{d, d'}} D_H (X,X'),$$

where $D_H$ is the Hausdorff distance in $X \coprod X'$ for the distance $D$.

Generalizations ?

These two distances are defined in a similar way. To create a distance between two objects $X$ and $Y$, we realize them simultaneously into a larger object, and quantify their distance in the larger object. Then, we take the infimum.

They also have the same kind of "universal" definitions. The Wasserstein distance between two probability measures can be defined as a minimum over all couplings between these probability measures (without specifying the base space). The Gromov-Hausdorff distance can be defined as a minimum over all isometric embeddings of the compact metric spaces.

In addition, these definitions or sort of compatible, in that it is possible to define a Gromov-Hausdorff-Wasserstein distance between probabilized compact metric spaces (see e.g. here).

From here, I have two questions:

1) Why does the definition of the Wasserstein distance uses a product, and the definition of the Gromov-Hausdorff distance a coproduct? I suspect it has to do with the fact that measures are naturally pushed forward and distances are naturally pulled backward (this is also what the .pdf I linked to suggests), but that's little more than a hunch.

2) Are there other examples of this kind of construction, where we define a distance between two structures by realizing them simultaneously, taking the distance in some way in this realization, and then taking the infimum over all such realizations? I was thinking in particular of algebraic structures, but anything goes.

3

There are 3 best solutions below

0
On

In regards to question 2, in graph theory there is the notion of a cut metric.

Suppose we are given graphs $G, H$ with the same number of vertices. For any subsets of vertices $U,V$, define $e(U,V)$ to be the number of edges between $U$ and $V$. Now for any bijection $f: G \rightarrow H$, we may compute $$ \max_{U,V \subseteq G}{\frac{|e(U,V) - e(f(U),f(V))|}{|U| |V|}}. $$ Then the cut metric $\delta_{\square}(G,H)$ is defined to be the minimum over all bijections $f$, of the above quantity.

This metric is useful because if you generate $G,H$ randomly by independently picking each edge with some fixed probability, the distance between them will be small. Moreover, the well known Szemeredi regularity lemma is equivalent to the statement that the completion of the space of graphs under the cut metric is compact (one needs to extend the metric to graphs with different numbers of vertices).

Anyway, this notion seems analogous to the Gromov-Hausdorff distance.

1
On

Part 1:

The probability measures $\mu$ and $\nu$ are defined on the same metric space. The difference is only on the measures that we consider which usually is $\mu\neq \nu$. To answer how much? we can simply define a metric of type $$d_{TV}(\mu,\nu)=\sup_{A\in\Omega}|\mu(A)-\nu(A)|$$ which is the total variation. This metric also has a coupling characteristic $$d_{TV}(\mu,\nu)=\inf\{P(X\neq Y): X, Y\, \mbox{random variables s.t.}\,\,\mathcal{L}(X)=\mu,\,\mathcal{L}(Y)=\nu\}$$

When we are interested in the distance between the compact metric spaces, both the metric $d$ as well as the metric space $X$ are possibly different from $(d^{'},X^{'})$. The question to be answered is how much far away are these metric spaces from being isometric to each other. I think the concept is fairly different and this also plays a role why in this case we are looking for the distance over the sets instead of probability measures. The difference is basically at the definition of isometry.

If $(X,d)$ is isometric to $(d^{'},X^{'})$ one should be able to find a mapping $f:X\mapsto X^{'}$ such that $d^{'}(f(x_0),f(x_1))=d(x_0,x_1)$ for all $x_0,x_1\in X$. If this is not true, then a metric should give the amount of discrepency. What one can do is to consider isometric mappings $f$ and $g$ and a metric space $M$ however in this new metric space, the former metric spaces $(X,d)$ and $(d^{'},X^{'})$ are coupled. On $M$, their distance properties are preserved but after embedding $d_H(f(X),g(X^{'}))$ is something new to compare. If one makes it for all $M$ as well as for all $f,g$, we will have various numbers. Among all the minimum should determine the discrepency.

Part 2:

Yes there are several metrics with the minimum or maximum property as you defined. For example as I wrote about Total variation metric. Levy metric,Prokhorov metric and discrepency metric are the others. Here is an almost complete list of such distances.

0
On