Let's just jump right in and define the dual form of the Wasserstein-1 distance for 2 measures: $\mu$ and $\nu$ as follows:
$$W_1(\mu, \nu ) = \sup_{f \in \text{1-Lipschitz} } \int f d\mu - \int f d\nu$$
The metric implies the following inequality, for any $f \in \text{1-Lipschitz}$. $$W_1(\mu, \nu ) \ge \int f d\mu - \int f d\nu$$
And here we switch to expected value notation, instead of integral form.
$$W_1(\mu, \nu ) \ge \mathbb{E}_{x \sim \mu}[ f(x) ] - \mathbb{E}_{x \sim \nu}[ f(x) ]$$
What I find interesting is the lack of an absolute value sign here, and I will explain why. Let's consider the identity function $f(x)=x$, surely $f$ is a Lipschitz-1 function. Plugging this in we now get the following inequality.
$$W_1(\mu, \nu ) \ge \mathbb{E}_{x \sim \mu}[ x ] - \mathbb{E}_{x \sim \nu}[ x ]$$
So the Wasserstein distance is an upper bound on the gap between the expected value of one distribution and that of another. But what if the right hand side is very negative, with magnitude exceeding $W_1(\mu, \nu)$? Example:
$$-W_1(\mu, \nu) \ge \mathbb{E}_{x \sim \mu}[ x ] - \mathbb{E}_{x \sim \nu}[ x ]$$
This supposedly is OK, but if I reverse the order of $\mu$ and $\nu$, which should be OK by the symmetric property of distance metrics, then $$W_1(\nu, \mu) = \mathbb{E}_{x \sim \nu}[ x ] - \mathbb{E}_{x \sim \mu}[ x ] \ge W_1(\mu, \nu)$$
Which now obviously contradicts the symmetric property.
So, my conclusion is that, in fact, the Wasserstein 1 distance is not only an upper bound on $\mathbb{E}_{x \sim \mu}[ x ] - \mathbb{E}_{x \sim \nu}[ x ]$ but also a lower bound on $-( \mathbb{E}_{x \sim \mu}[ x ] - \mathbb{E}_{x \sim \nu}[ x ] )$, but I'm just not quite seeing how that lower bound manifests itself without having an absolute value symbol in the definition.
I did go through the derivation by Marco Cuturi in this video, which is the best I've seen yet. I feel I understand mostly all the steps of the proof, but still not quite seeing where this lower bound idea is enforced.
If I had to give a guess as to the reason, it would have to be because if $f$ is a 1-Lipschitz function, then so is $-f$, so then the inequality would have to hold for $-f$ as well, which would switch the signs, and enforce this lower bound I am referring to. Is this in the right direction?
Thanks.