I'm following some code for implementing Wasserstein distance. They provide a link to this paper https://arxiv.org/pdf/1509.02237.pdf
On page 10 they state proposition 1, namely The $p$-Wasserstein distance between two probability measures $P$ and $Q$ on $\mathbb{R}$ with $p$-finite moments can be written as
$$W_{p}^p(P,Q) = \int^{1}_{0} |F^{-1}(t) - G^{-1}(t)|^p dt $$
where $F^{-1},G^{-1}$ are the quantile functions of $P$ and $Q$ respectively. Now the proof is provided on page 17. I sort of follow most of it but how do they arrive the last result
$$\int_{supp \pi^* } |x-y|^p d \pi^*(x,y) = \int^{1}_{0} |F^{-1}(t) - G^{-1}(t)|^p dt$$
My guess is they made the substitution $F(x)=G(y) = t$ so then the substitution works for $|x-y| = |F^{-1}(t) - G^{-1}(t)|$ but i can't get the rest to work out. Can somebody add the steps in logic in?
Let $(X, Y) \sim \pi^*$ be the optimal coupling and $P^{Y|X=x}$ be the conditional probability of $Y$ given $X=x$ (formally I would use disintegration here). Then the cost is given by $$\int\int (x-y)^p dP^{Y | X=x}(y)dP^X(x).$$ Now we can use that, given $U\sim Unif[0, 1]$, it holds that $F^{-1}(U)$ has the same distribution as $X$. Hence by a change of variables the OT-cost becomes: $$\int \int (F^{-1}(t)-y)^p dP^{Y | X=F^{-1}(t)}(y)dt.$$ Finally by the observations in the paper on can deduce $P^{Y|X=F^{-1}(t)}=\delta_{G^{-1}(t)}$.
Disclaymer: I am pretty sure I am missing some point as well since it seems to me that it would be way simpler to show that the coupling $\pi^*$ defined by $(F^{-1}(U), G^{-1}(U))\sim \pi^*$ is optimal becouse it is ciclically monotone.