\begin{equation} \mathcal{W}_\epsilon(\alpha, \beta) = \min_{\pi\in \Pi(\alpha\beta)} \int c(x,y) \mathrm{d}\pi(x,y) + \epsilon H(\pi \| \alpha \otimes \beta) \end{equation} Cuturi (2013) introduced the entropy-regularized Wasserstein distance, or Sinkhorn distance, shown above, where $\epsilon $ is the regularization parameter and $H(\pi \| \alpha \otimes \beta)$ is the relative entropy, or KL-divergence, between the transport plan and the marginal probabilities.
But I have seen the $H(\cdot)$ term shown in two different ways, one with entropy and the other with relative entropy:
\begin{align} H(\pi) &= \int \pi(x,y) \ln \pi(x,y) \\ H(\pi \| \alpha \otimes \beta) &= \int \ln \left(\frac{\mathrm{d}\pi (x,y)}{\mathrm{d}\alpha(x) \mathrm{d}\beta(y) } \right) \mathrm{d}\pi (x,y) \end{align}
How are the last two lines equal or connected to each other? Obviously they're not the same, so why are there two different versions running around?
These two are actually equivalent up to a constant when $\pi$ is a coupling of $\alpha$ and $\beta$. I'll assume that $\pi,\alpha, \beta$ all have densities. We can then write:
$$ H(\pi||\alpha\otimes \beta) = \int\ln\left(\frac{d\pi}{d\alpha d\beta} \right)d\pi = \int \pi(x,y) \ln\left(\frac{\pi(x,y)}{\alpha(x)\beta(y)} \right) dx dy $$
Note that $\pi(x,y)$ is the density with respect to the Lebesgue measure, and the same can be said for $\alpha(x)$ and $\beta(y)$. Therefore:
$$ H(\pi||\alpha\otimes \beta) = \int\pi(x,y)\ln \pi(x,y) dx dy - \int\pi(x,y)\ln(\alpha(x))dxdy - \int\pi(x,y)\ln(\beta(y))dxdy =\\ = \int \pi(x,y) \ln\pi(x,y) dx dy - \int\alpha(x)\ln\alpha(x) dx -\int \beta(y) \ln \beta(y) dy = H(\pi) - H(\alpha) - H(\beta) $$
Since $\alpha$ and $\beta$ are fixed, we get $H(\pi) + C$, where $C$ is a constant.