So I am following Barrera, Högele and Pardo's paper, about cutoff thermalization in the Wasserstein distance (you can find it here) and they prove the shift linearity property that goes:
For $p\geq 1$, $u_1\in R^d$ a deterministic vector and $U_1$ a random vector in $R^d$ with $p^{th}$ finite moment, if follows that $\mathcal{W}_p(u_1+U_1,U_1)=\vert u_1 \vert$.
And then, the proof starts as:
The sychronous replica $(U_1,U_1)$ with joint law $\Pi(du,du)$ (natural counpling) yields an upper bound for any $p>0$ as follows $$\mathcal{W}_p(u_1+U_1,U_1)\leq \left(\int_{R^d\times R^d} \vert u_1+u-u\vert ^{p} \Pi(du,du)\right)^{1/p}=\vert u_1 \vert.$$
Now, I have multiple questions on this. It is clear that it is an upper bound by defintion of $\mathcal{W}_p$ with an infimum. However
Why do they get the terms $\vert u_1 +u -u \vert$ inside the integral? Like, what I understand is that with the joint probability each event $x$ has the same chance to take certain value $u$ under $U_1(x)$, however not necessairly that would imply that the results are exactly the same overall. In that case, why would it be wrong to think $$\mathcal{W}_p(u_1+U_1, U_1)\leq\mathbb{E}[\vert u_1 + U_1 - U_1 \vert^p]^{1/p}=\mathbb{E}[\vert u_1 \vert^p]^{1/p}=\vert u_1 \vert.$$
The coupling has 2 entries $(x,y)\in R^d \times R^d$ so why does it look as if it were one in terms of the other? As if we had only $x\in R^d$ but two maps, one to $U_1(x)$ and other to $U_1(x)+u_1$.
Why does it seem as if the $u-u$ cancel everywhere to yield the $\vert u_1 \vert$ equality? This is related to the first question, I understand that if we take $U_1(x)$ and $U_1(x)+u_1$ we have some chance on getting a $u=U(x)$ and the same chance to get a $u=U(x)+u_1$. However, couplings take two entries and its all over $R^d\times R^d$ and what I said, maybe could work for the diagonal, but in principle $U_1(x)$ and $U_1(y)+u_1$ won't necessarily have the same chances and even less the same outcomes.
With that said, I think that I am not understanding correctly the intuition over couplings (or even maybe on the random variables), so any heuristics or explenations on how does this "natural coulpling" work, will be highly appreciated.
I think it has something to do with the fact that $(U_1, U_1+u_1)$ is a deterministic coupling (see Villani p.6 here) since the function $T: R^d \to R^d$ given by $T(U_1)=U_1+u_1$ is measurable (i.e $T_{\#}du = dv$). Hence, the law of $\Pi(du,dv)$ is concentrated on the graph of $T$ leading to
$\int_{R^{d}\times R^{d}} \vert T(u) - u \vert \Pi(du,dv) =\int_{R^{d}\times R^{d}} \vert u_1 + u - u \vert \Pi(du,dv) = \vert u_1 \vert.$