Different versions of the entropy term in the entropy-regularized Wasserstein distance

Question

Different versions of the entropy term in the entropy-regularized Wasserstein distance

208 Views Asked by Bumbble Comm At 24 Feb 2026 - 5:25

\begin{equation} \mathcal{W}_\epsilon(\alpha, \beta) = \min_{\pi\in \Pi(\alpha\beta)} \int c(x,y) \mathrm{d}\pi(x,y) + \epsilon H(\pi \| \alpha \otimes \beta) \end{equation} Cuturi (2013) introduced the entropy-regularized Wasserstein distance, or Sinkhorn distance, shown above, where $\epsilon $ is the regularization parameter and $H(\pi \| \alpha \otimes \beta)$ is the relative entropy, or KL-divergence, between the transport plan and the marginal probabilities.

But I have seen the $H(\cdot)$ term shown in two different ways, one with entropy and the other with relative entropy:

\begin{align} H(\pi) &= \int \pi(x,y) \ln \pi(x,y) \\ H(\pi \| \alpha \otimes \beta) &= \int \ln \left(\frac{\mathrm{d}\pi (x,y)}{\mathrm{d}\alpha(x) \mathrm{d}\beta(y) } \right) \mathrm{d}\pi (x,y) \end{align}

How are the last two lines equal or connected to each other? Obviously they're not the same, so why are there two different versions running around?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 07 Jan 2021 - 7:23

I would like to add a couple of points here which I think shouldn't be overlooked.

Neither of the choices are "wrong". In the 2013 Cuturi paper you reference he chooses to regularise with "entropy" (notice this is actually the negative Boltzmann entropy) :

$$ H(\pi)= \begin{cases} \int \pi \log \pi~~&\text{when}~\pi~\text{has a density} \\ \infty & o.w \end{cases}. $$

The reason this is a natural choice for regularisation is that it does the "smoothing" or "relaxing" job that regularisation is intended to do. Adding $H$ into the optimal transport problem gives the mass "freedom to spread out". This can be seen in this example let $\mu$ be concentrated on the two points $x_1,x_2 \in \mathbb{R}$ such that $\mu(x_1)=\mu(x_2)=\frac{1}{2}$, and $\nu$ be concentrated on the two points $y_1,y_2 \in \mathbb{R}$ such that $\nu(y_1)=\mu(y_2)=\frac{1}{2}$, then the optimal coupling $\pi$ which maximises $H$ is

$$ \pi(x_i,y_j)=\frac{1}{4},\forall~i,j.$$

Since we have a minimisation problem its beneficial to be adding a uniformly convex term, again $H$ ticks that box!
The choice of adding $H(\pi~||~\alpha\otimes\beta)$, the entropy conditioned on the product measure, has its advantages as outlined in https://audeg.github.io/publications/these_aude.pdf. As far as I understand it lets you rephrase the dual problem in a neat way.
Now to comparing the two choices: I don't think it matters too much either way, both do the same job. As the other answer points out, the minimiser is the same, and they differ by a constant $C$. Lastly remember that, usually when "doing regularisation" you have a small parameter $\epsilon \ll 1$ multiplying the regularisation term, hence

$$ \epsilon \Big(H(\pi~||~\alpha\otimes\beta)-H(\pi)\Big)=\epsilon C \ll 1 .$$

edit I have changes to make about what choice to take.

**Bumbble Comm** · Accepted Answer

These two are actually equivalent up to a constant when $\pi$ is a coupling of $\alpha$ and $\beta$. I'll assume that $\pi,\alpha, \beta$ all have densities. We can then write:

$$ H(\pi||\alpha\otimes \beta) = \int\ln\left(\frac{d\pi}{d\alpha d\beta} \right)d\pi = \int \pi(x,y) \ln\left(\frac{\pi(x,y)}{\alpha(x)\beta(y)} \right) dx dy $$

Note that $\pi(x,y)$ is the density with respect to the Lebesgue measure, and the same can be said for $\alpha(x)$ and $\beta(y)$. Therefore:

$$ H(\pi||\alpha\otimes \beta) = \int\pi(x,y)\ln \pi(x,y) dx dy - \int\pi(x,y)\ln(\alpha(x))dxdy - \int\pi(x,y)\ln(\beta(y))dxdy =\\ = \int \pi(x,y) \ln\pi(x,y) dx dy - \int\alpha(x)\ln\alpha(x) dx -\int \beta(y) \ln \beta(y) dy = H(\pi) - H(\alpha) - H(\beta) $$

Since $\alpha$ and $\beta$ are fixed, we get $H(\pi) + C$, where $C$ is a constant.

Different versions of the entropy term in the entropy-regularized Wasserstein distance

There are 2 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in ENTROPY

Related Questions in OPTIMAL-TRANSPORT

Trending Questions

Popular # Hahtags

Popular Questions