Why is $$ C(\mathbb{Q}, \mathbb{P}):=\inf _{\mathbb{M}} \sqrt{\int \sum_{i=1}^{n}\left(\mathbb{M}\left[Y_{i} \neq x_{i} \mid X_{i}=x_{i}\right]\right)^{2} d \mathbb{P}(x)} $$ where the infimum ranges over all couplings $\mathbb{M}$ of the pair $(\mathbb{Q}, \mathbb{P})$.
equivalent to $$ C(\mathbb{Q}, \mathbb{P})=\sqrt{\int\left|1-\frac{d \mathbb{Q}}{d \mathbb{P}}(x)\right|_{+}^{2} d \mathbb{P}(x)} $$ where $t_{+}:=\max\{0, t\}$.
? This result is presented in Section 3.3.5 of the book High-dimensional statistics: A non-asymptotic viewpoint(Wainwright, 2019).
Short explanation: Both sides are measures of the total variational distance
Note: This is just an attempt at explanation. Maybe someone more knowledgeable than me in this area can write a much better answer.
Setting: First the author takes up a simpler instance: a single random variable $X$ in section 3.3.1
$(\mathbb{Q}, \mathbb{P})$ are probability distributions over $X$. $\mathbb M$ is a distribution on the product space $X \otimes X$ is a coupling on the pair $(\mathbb{Q}, \mathbb{P})$ if the marginal distrubutions of $\mathbb M$ on the first and second coordinates coincide with $(\mathbb{Q}, \mathbb{P})$ respectively.
In fact, by (3.55), $\int f(d\mathbb Q - d\mathbb P) = \inf\limits_{\mathbb M}\mathbb E_{\mathbb M}[\rho(X,X')]$
Now we go from 1 random variable $X$ to a sequence of random variables: $\{X_1,\dots X_n\}$ (generated by a markov chain) in section 3.3.4. We wish to find the total variational distance between two distributions $(\mathbb{Q}, \mathbb{P})$ on these n random variables.
Finally note the definition of the Radon Nikodym derivative (which gives the definition of $\frac{d \mathbb Q}{d \mathbb P}$). This is defined in section 3.3.2 of the book.
The first expression is a measure of the distance between two distributions
For an explanation on why the infimum over the coupling probabilities (where $X\neq Y$) is the total variational distance, see lemma 4.11 from this document by Sebastien Roch here:
$||\mathbb P - \mathbb Q||_{TV} = \inf\{Prob(x\neq y) \text{ coupling (X,Y) of }\mathbb P\text{ and }\mathbb Q)\}$.
Also see proposition 1.7 of this linked document by Parimal Parag
In our case we have to sum these over the n random variables $\{X_1,\dots ,X_n\}$.
Note: The linked file also has some good explanations on what coupling means
The second expression is a measure of the wasserstein distance between two distributions
Note the outer part of the formula comes from the definition of wasserstein distance. There are various types of wasserstein distance. See table 1 in this linked document by Yuhang Cai and Lek-Heng Lim for various measures used and the one in the text in the second expression.
With this in mind, we have that both are different measures of the divergence between distributions. Are they equivalent? See lemma 2 of this attached document by PM Samson