How can I understand Wasserstein Metric?

310 Views Asked by Bumbble Comm At 27 Mar 2026 - 1:50

I've met Wasserstein metric in different topic, most in sampling and mathematic model of machine learning.

For two density function $\mu,\nu$ on $R^d$, the wasserstein distance between $\mu,\nu$ can be defined as: $$ W_2(\mu,\nu) = \inf_{P \in \Gamma(\mu,\nu)}\{ \int_{R^d \times R^d} |x - y|^2dP\} $$ Where $\Gamma(\mu,\nu)$ is the set of all the joint distribution on $R^{2d}$ with marginal distribution $\mu,\nu$

My question is:

How can I understand the definition by institution?
My teacher says there is connection between Wasserstein distance, Partial Differential Equation and Optimization. How can I understand that?
Are there any good reference or notes on this topic?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 09 Nov 2023 - 11:14 BEST ANSWER

One way I like to think about Wasserstein Distance, similar to Robert's answer and Milo's comment, frames the object as the amount of effort required to consolidate each distribution $\mu, \nu$ into a single distribution, or more literally the distance between to probability distributions. This can be visualized with a picture

This is the most intuitive interpretation I have seen. As for your second question, Wasserstein distance may be used when looking at sampling algorithms like Metropolis-Hastings, which is commonly used to answer similar questions as the ones often posed in optimization settings. Instead of learning a single optimal parameter (like with direct optimization), you are learning the distribution of the optimal parameter. Wasserstein distance is used in evaluating target distributions in MCMC, in addition to the more direct connection to optimization mentioned by Robert. In MCMC, a large Wasserstein distance between the target and proposal distribution may suggest a poor choice of parameters or acceptance probability, and could therefore be used to optimize the algorithm to result in low Wasserstein distance, corresponding to more accurate results.

A more in-depth source on W.D. can be found at: https://library.oapen.org/bitstream/id/b27ca94b-41a7-486c-863f-8de6b3a8f914/2020_Book_AnInvitationToStatisticsInWass.pdf

Bumbble Comm On 23 Dec 2019 - 6:20

It may be easier to think of discrete masses rather than densities. Suppose $\mu$ and $\nu$ are discrete probability measures. You try to get $\nu$ from $\mu$ by moving around various packets of mass. The cost of moving a packet of mass $m$ a distance $d$ is $m d^2$. Then the Wasserstein distance from $\mu$ to $\nu$ is the minimum total cost of doing this. How to find the way to transform $\mu$ to $\nu$ at minimum total cost is the connection to Optimization.

How can I understand Wasserstein Metric?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in METRIC-SPACES

Related Questions in MACHINE-LEARNING

Related Questions in APPLICATIONS

Trending Questions

Popular # Hahtags

Popular Questions