I am going through the book Computational Optimal Transport by Peyré and Cuturi. In it (see Formula (4.1), P. 65/209) I came across the definition of the discrete entropy, which is not what I expected: $$ H(P) = - \sum_{i,j}P_{i,j}(\log(P_{i,j})-1), $$ with $P \in [0,1]^{n,m}$ being a coupling matrix, i.e. describing the transformation of discrete propability distributions. What confuses me is the $-1$ in the sum and I would like to know, why it is included. Since $\sum_{i,j} P_{i,j}=1$ it is shifting the result by $-1$. Still it confuses me.
2026-02-23 10:17:35.1771841855
On
On
Alternate definition of Entropy
356 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
3
There are 3 best solutions below
1
On
I think it might just be a strange error... In Cuturi's Paper on Sinkhorn Distances entropy is defined in the usual way.
4
On
Adding to the answer by @J.G, one reason for adding the -1 term could be to have cleaner expressions for the partial derivative of the dual formulation with respect to the transport plan matrix $\mathbf{P}$.
If there was no $-1$ term, then the partial derivative (page 63 of Computational Transport by Peyré and Cuturi) would have an extra $\epsilon$ term. Having the $-1$ in the coupling entropy eliminates this $\epsilon$ term, leading to a simpler expression for the Sinkhorn iterations.
The obvious reason $-\sum_{i,\,j}P_{i,\,j}\ln P_{i,\,j}$ is a more common definition is that it's extensive, but we often don't care about that, so much as about relative entropies. In that case, one may as well add a constant $c$ to taste, giving $c-\sum_{i,\,j}P_{i,\,j}\ln P_{i,\,j}=-\sum_{i,\,j}P_{i,\,j}(\ln P_{i,\,j}-c)$. The choice $c=1$ has the unique attraction that the whole expression is just $-\sum_{i,\,j}\int_0^{P_{i,\,j}}\ln pdp$.