Distance between probability measures
Let $(X,d)$ be a compact metric space, and let $\mu$ and $\nu$ be two probability measures on $X$. We can define the Wasserstein distance between $\mu$ and $\nu$ in the following way.
Let $\mathcal{P}_{\mu, \nu}$ be a the set of probability measures on $X \times X$ whose first marginal is $\mu$ and second marginal is $\nu$. Then:
$$W_1 (\mu, \nu) = \min_{\pi \in \mathcal{P}_{\mu, \nu}} \int_{X \times X} d(x,y) \pi (dx,dy).$$
Distance between compact spaces
Let $(X,d)$ and $(X',d')$ be two compact metric spaces. We can define the Gromov-Hausdorff distance between $\mu$ and $\nu$ in the following way.
Let $\mathcal{D}_{d, d'}$ be a the set of distances on $X \coprod X'$ whose restriction to $X \times X$ is $d$ and whose restriction to $X' \times X'$ is $d'$. Then:
$$d_{GH} ((X,d), (X',d')) = \inf_{D \in \mathcal{D}_{d, d'}} D_H (X,X'),$$
where $D_H$ is the Hausdorff distance in $X \coprod X'$ for the distance $D$.
Generalizations ?
These two distances are defined in a similar way. To create a distance between two objects $X$ and $Y$, we realize them simultaneously into a larger object, and quantify their distance in the larger object. Then, we take the infimum.
They also have the same kind of "universal" definitions. The Wasserstein distance between two probability measures can be defined as a minimum over all couplings between these probability measures (without specifying the base space). The Gromov-Hausdorff distance can be defined as a minimum over all isometric embeddings of the compact metric spaces.
In addition, these definitions or sort of compatible, in that it is possible to define a Gromov-Hausdorff-Wasserstein distance between probabilized compact metric spaces (see e.g. here).
From here, I have two questions:
1) Why does the definition of the Wasserstein distance uses a product, and the definition of the Gromov-Hausdorff distance a coproduct? I suspect it has to do with the fact that measures are naturally pushed forward and distances are naturally pulled backward (this is also what the .pdf I linked to suggests), but that's little more than a hunch.
2) Are there other examples of this kind of construction, where we define a distance between two structures by realizing them simultaneously, taking the distance in some way in this realization, and then taking the infimum over all such realizations? I was thinking in particular of algebraic structures, but anything goes.
In regards to question 2, in graph theory there is the notion of a cut metric.
Suppose we are given graphs $G, H$ with the same number of vertices. For any subsets of vertices $U,V$, define $e(U,V)$ to be the number of edges between $U$ and $V$. Now for any bijection $f: G \rightarrow H$, we may compute $$ \max_{U,V \subseteq G}{\frac{|e(U,V) - e(f(U),f(V))|}{|U| |V|}}. $$ Then the cut metric $\delta_{\square}(G,H)$ is defined to be the minimum over all bijections $f$, of the above quantity.
This metric is useful because if you generate $G,H$ randomly by independently picking each edge with some fixed probability, the distance between them will be small. Moreover, the well known Szemeredi regularity lemma is equivalent to the statement that the completion of the space of graphs under the cut metric is compact (one needs to extend the metric to graphs with different numbers of vertices).
Anyway, this notion seems analogous to the Gromov-Hausdorff distance.