If $\alpha = \beta$, why can't the entropy-regularized Wasserstein distance equal $0$?

300 Views Asked by At

In optimal transportation theory, the optimal re-allocation of probability distribution $\alpha$'s mass to another distribution $\beta$ is solved by minimizing the Wasserstein distance with respect to the transport plan.

$$W (\alpha, \beta) = \min_{\pi\in \Pi(\alpha\beta)} \int c(x,y) \mathrm{d}\pi(x,y) $$

Alternatively, the relative entropy-regularized Wasserstein distance, also called Sinkhorn distance, can be used:

$$W_\epsilon (\alpha, \beta) = \min_{\pi\in \Pi(\alpha\beta)} \int c(x,y) \mathrm{d}\pi(x,y) + \epsilon H(\pi \| \alpha \otimes \beta)$$ where $\epsilon$ is the regularization parameter, and relative entropy is$$H(\pi \| \alpha \otimes \beta) = \int \ln \left(\frac{\mathrm{d}\pi (x,y)}{\mathrm{d}\alpha(x) \mathrm{d}\beta(y) } \right) \mathrm{d}\pi (x,y) $$ Aude Genevay said that if you try the extreme case where both the source and target distributions are identical, $\alpha = \beta$, then we would expect the entropy-regularized Wasserstein distance (Sinkhorn distance) to equal $0$ since there is nothing to move, however it is incapable of doing so. Because of this she proposes the Sinkhorn divergence instead, a normalization which does equal $0$ if $\alpha = \beta$:

$$\bar{W}_\epsilon (\alpha, \beta) = W_\epsilon (\alpha, \beta) - \frac{1}{2} [W_\epsilon (\alpha, \alpha) + W_\epsilon (\beta, \beta) ]$$ In other words, $\bar{W}_\epsilon (\alpha, \alpha) = 0$.

Questions

  1. Why (or for what levels of regularization) can't the Sinkhorn distance, shown earlier, achieve $0$?
  2. Does standard optimal transport, which uses the unregularized Wasserstein distance, also suffer from this incapability (even though I know that the Wasserstein distance by itself, without OT, will achieve $0$)?
  3. and why, mathematically, does the Sinkhorn divergence?
1

There are 1 best solutions below

4
On
  1. To show why this is the case I think the way to go about things is to consider the first term in

$$W_\epsilon (\alpha, \beta) = \min_{\pi\in \Pi(\alpha,\beta)} \int c(x,y) \mathrm{d}\pi(x,y) + \epsilon H(\pi \| \alpha \otimes \beta)$$

and note it is only zero when $\pi(A,B)=\pi(A\cap B,A\cap B)$ (i.e when $\pi$ puts all the mass onto the diagonal $x=y$), otherwise it is positive. Then you are just left to check what happens to $\epsilon H(\pi \| \alpha \otimes \beta)$ when $\pi$ is of such a form.

  1. When you say "even though I know that the Wasserstein distance by itself, without OT, will achieve 0 " this doesn't really make sense the Wasserstein distance is an OT problem itself. To see why $W_2(\alpha,\alpha)=0$ take as a $\gamma=f_\#\alpha$ where $f:\mathbb{R}^d\to \text{diag}(\mathbb{R}^d)$ is defined as $f(x)=(x,x)$, and define as a candidate $\pi\in\Pi(\alpha,\beta)$ (not necessarily optimal) $\pi(A)=\gamma(A \cap \text{diag}(\mathbb{R}^d))$. Note $\text{diag}(\mathbb{R}^d)=\{(x,x)\in\mathbb{R}^{2d}~:~x\in\mathbb{R}^d\}$

  2. Are you asking how $\tilde{W}_\epsilon(\alpha,\alpha)=W_\epsilon(\alpha,\alpha)-\frac{1}{2}(W_\epsilon(\alpha,\alpha)+W_\epsilon(\alpha,\alpha))=0$?


Edit : for 1) and $c(x,y)=\|x-y\|^2$ proceed like this :

  • Note that since $H(\pi|\gamma)=\int h(\frac{d\pi}{d\gamma})d\gamma$ for $h(x)=x\log x$ which is strictly convex. Hence using Jensen $H(\pi|\gamma)\geq 0$ with equality IFF $\frac{d\pi}{d\gamma}$ is constant, i.e. $\gamma=\pi$.
  • So we are forced to consider (where we set $M_i$ equal to the i-th moment of $\alpha$) $$ \frac{1}{2}\int \|x-y\|^2 d\alpha(x)d\alpha (y) = M_2-\int x\cdot y d\alpha(x)d\alpha(y) \geq M_2 -M_1^2 >0,$$

where in the last line we used Jensens inequality for the strictly convex function $x\mapsto x^2$.