I have a question regarding the following concept equating total variation distance with a particular case of optimal transport.
I don't understand why equality (6.11) holds. We know by Kantorovich duality that the RHS is equal to $$2 \sup_{\phi \text{ Lipschitz} \\ |\phi|_{\text{Lip}} \leq 1} \int \phi d\mu - \int \phi d\nu \equiv f(\mu, \nu)$$ as a function is $c-$convex for a distance function $c = 1(x \ne y)$ if and only if it is $1-$Lipschitz.
As for the total variation, it is defined as $$T(\mu, \nu) \equiv \sup_{A \in \mathcal{F}} |\mu(A) - \nu(A)|$$ where $\mathcal{F}$ is our $\sigma-$algebra on whichever Polish space we're working with. It is obvious that for $\phi(x) = 1_A(x)$, we have that $\phi$ is $1-$Lipschitz and therefore $T(\mu, \nu) \leq f(\mu, \nu)$. I'm confused why we need the $2$ here, and how the other direction of the inequality would be shown?
Specifically, I need that for any $1-$Lipschitz function, there exists a set $A \in \mathcal{F}$ such that $|\mu(A) - \nu(A)| \ge 2 \int \phi d\mu - \int \phi d \nu$, but I have no idea how to get this right. Any help would be massively appreciated.
(The excerpt is from Villani (2009))

I think there is some difference in definition. Look the lecture notes Probability in High Dimensions by Van-Handel. In example 4.14 the author writes:
$$ ||\mu - \nu||_{TV} = \inf_{M\in\mathcal C(\mu,\nu)}M(X\neq Y) $$
And he then goes on to prove this.
What might be happening is a different definition of the T.V metric.
Indeed, we can prove that using your definition of TV, the equality $$||\mu - \nu||_{TV} = \sup_A|\mu(A) - \nu(A)| = 2\inf P[X\neq Y]$$
Would be inconsistent. Note:
$$\mu(A) - \nu(A) = P[X \in A] - P[Y \in A] = $$ $$= P[X \in A, X=Y] - P[X \in A,X\neq Y]+ P[Y \in A,X=Y] - P[Y \in A,X\neq Y] = $$ $$ = P[X \in A, X\neq Y] - P[Y \in A, X \neq Y] \leq P[X\neq Y] $$ Therefore, $$\sup_A|\mu(A) - \nu(A)| \leq P[X\neq Y]$$ Hence, $\sup_A|\mu(A) - \nu(A)|>0 \implies 2P[X\neq Y]> \sup_A|\mu(A)-\nu(A)|$