Question about transportation-entropy inequality (From Villani's book: Optimal Transport, Old and New)

225 Views Asked by At

I was reading Villani's book: Optimal Transportation, Old and New.

From page 80-83, he introduced some results about dual formulation of transport inequality.

Assume $C(\mu,\nu)$ is the optimal transport distance from probability measure $\mu$ (defined on $\mathcal{X}$) to $\nu$ (defined on $\mathcal{Y}$), with cost function $c(\cdot,\cdot)$.
Given a convex functional $F(\cdot)$ defined on $P(\mathcal{X})$, we can define its Legendre Transformation : $L(F)=\Lambda$, so $\Lambda$ is a convex functional on $C_b(\mathcal{X})$.
The main result (Theorem 5.26) is: \begin{equation*} \forall \mu \in P(\mathcal{X}), C(\mu,\nu)\leq F(\mu) \end{equation*} and \begin{equation*} \forall \phi \in C_b(\mathcal{Y}), \Lambda(\int_\mathcal{Y}\phi d\nu-\phi^c)\leq 0\quad \phi^c:=\sup_{y\in\mathcal{Y}}(\phi(y)-c(x,y)) \end{equation*} Are equivalent.

Well this result is used to analyze transport inequalities. In Example 5.29, the author gives an important application of this result:
Assume $\mathcal{Y}=\mathcal{X}$ and consider: \begin{equation*} F(\mu):=KL(\mu||\nu)=\int_{\mathcal{X}}\ln(\frac{d\mu}{d\nu})d\mu \end{equation*} which is the Kullback Liebller Divergence between measures (also known as relative entropy); We could compute the Legendre Transformation of $F$, which has the following form: \begin{equation*} \Lambda(\phi):=\ln(\int_{\mathcal{X}}e^{\phi}d\nu) \end{equation*} Thus by the previous result, the transportation-entropy inequality \begin{equation*} C(\mu,\nu)\leq KL(\mu||\nu) \quad \forall \mu\in P(\mathcal{X}) \quad (1) \end{equation*} is equivalent to : \begin{equation} \ln(\int_{\mathcal{X}}e^{\int \phi d\nu-\phi^c}d\nu)\leq 0 \Leftrightarrow e^{\int_{\mathcal{X}}\phi d\nu}\leq (\int_{\mathcal{X}}e^{-\phi^c}d\nu)^{-1} \quad (2) \end{equation} But in the book, the author directly arrives at: \begin{equation} e^{\int_{\mathcal{X}}\phi d\nu}\leq \int_{\mathcal{X}}e^{\phi^c}d\nu \quad (3) \end{equation} If we apply Cauchy inequality, we will deduce from (1) to (3): \begin{equation*} e^{\int_{\mathcal{X}}\phi d\nu}\leq (\int_{\mathcal{X}}e^{-\phi^c}d\nu)^{-1}\leq \int_{\mathcal{X}}e^{\phi^c}d\nu \end{equation*} But how could we deduce from (3) back to (1)? I am quite confused about it. I even think that $(3)$ and $(1)$ are equivalent is not a trivial corollary from our previous Theorem 5.26.

Can any expert in this area help me with this problem? So many thanks!