Let $(X,d)$ be a complete metric space and define $$\mathcal{P}_2(X) := \{ \mu \text{ Borel probability measure} \mid \int_X d^2(x,x_0) d\mu(x) < \infty \text{ for some } x_0 \in X \}$$ endowed with the Wasserstein distance $$ W^2_2(\mu, \nu) = \inf \{ \int_{X \times X} d^2(x,y) d\pi(x,y) \mid \pi \in \Gamma(\mu, \nu) \} $$ where $\Gamma(\mu, \nu)$ is the set of probability measure on $X \times X$ which marginals are $\mu$ and $\nu$.
I this setting, I am trying to understand the proof of
Theorem If $\mu \in \mathcal{P}_2(X)$ and $ \{ \mu_n \} \subset \mathcal{P}_2(X)$ then $$ \mu_n \overset{W_2}{\longrightarrow} \mu \Leftrightarrow \biggl [ \mu_n \rightharpoonup \mu \text{ and } \int_X d^2(x,x_0) d\mu_n \longrightarrow \int_X d^2(x,x_0)d\mu \text{ for some }x_0 \in X \biggr ]$$
There are 3 steps in the proof I can't completely understand, assume $(X,d)$ is compact for the first two points:
Let $$Z:= \{ f \in \text{Lip}_1(X,d) \mid f(x_0)=0 \} $$ then $$\sup_{f \in \text{Lip}_1(X,d)} \biggl | \int_X f d(\mu_n -\mu) \biggr |= \sup_{f \in Z} \biggl | \int_X f d(\mu_n -\mu) \biggr | $$
Let $A \subset X$ be an open subset, then $$ \liminf_{n} \int_A d^2(x,x_0)d\mu_n \ge \int_A d^2(x,x_0)d\mu $$
Given a sequence of compact subsets $ \{ K_k \}_{k \ge 1}$ s.t. $$ \lim_{k \to +\infty} \sup_n \int_{X \setminus K_k} d^2(x_0, \cdot)d\mu_n =0$$ define $$\mu_{n,k} := \mu_n |_{K_k} + (1-\mu_n(K_k))\delta_{x_0} $$ then, up to subsequences, $ \{ \mu_{n_k} \}_{n}$ is weak convergent.
Any hint will be very appreciated!
I can, at the very least, offer a proof in the $(\implies)$ direction and presupposing that closed balls in $X$ are compact.
Proof: Indeed, suppose $W_{2}(\mu_{n},\mu)\to 0$. Consider the Dirac delta point, $\delta_{x_{0}}$. Then, since $W_{2}(\mu_{n},\mu)$ is a distance on the second moment space $\mathcal{P}_{2}(X)$, we have
$W_{2}(\mu_{n},\delta_{x_{0}})\leq W_{2}(\mu_{n},\mu)+W_{2}(\mu,\delta_{x_{0}})\implies W_{2}(\mu_{n},\delta_{x_{0}})-W_{2}(\mu,\delta_{x_{0}})\leq W_{2}(\mu_{n},\mu).$ As $W_{2}(\mu,\delta_{x_{0}})-W_{2}(\mu_{n},\delta_{x_{0}})\geq -W_{2}(\mu_{n},\mu),$ then
$|W_{2}(\mu_{n},\delta_{x_{0}})-W_{2}(\mu,\delta_{x_{0}})|\leq W_{2}(\mu_{n},\mu).$
Then by definition of $W_{2}$, $\Big|\sqrt{\int d^{2}(x,x_{0})d\mu_{n}(x)}-\sqrt{\int d^{2}(x,x_{0})d\mu(x)}\Big|=|W_{2}(\mu_{n},\delta_{x_{0}})-W_{2}(\mu,\delta_{x_{0}})|\leq W_{2}(\mu_{n},\mu)\to 0,$ establishes $\int d^{2}(\cdot,x_{0})d\mu_{n}\to \int d^{2}(\cdot,x_{0})d\mu$.
Now assuming closed balls of $X$ are compact, the weak convergence $\mu_{n}\rightharpoonup \mu$ a priori follows. The subtlety comes from a priori tightness condition. Indeed, for such closed balls of radius $R>0$ centered at $x_{0}$, $B_{R}(x_{0})$ we have that estimate
$R^{2}\mu(X\setminus B_{R}(x_{0}))\leq \int_{X}d^{2}(x,x_{0})d\mu(x)$ along with the uniform bound of $\int d^{2}(x,x_{0})d\mu(x)$ show that a sequence $\mu_{n}$ is tight. Hence, it is enough to show the weak convergence against compactly supported continous functions, $f\in C_{c}(X)$. Now, as the set of Lipschitz continuous functions are dense in the set of compactly supported functions with respect to the topology of uniform convergence, it suffices to check the weak convergence for Lipschitz continuous functions.
To that end, fix $f$ Lipschitz; that is, for all pair of points $x,y$, and constant $C>0,$ $|f(x)-f(y)|\leq Cd(x,y)$. Then $\Big|\int f(x)d\mu_{n}(x)-\int f(y)d\mu(y)\Big|=|\int (f(x)-f(y))d\pi_{n}(x,y)|\leq C\int d(x,y)d\pi_{n}(x,y)=C\Big(\Big(\int d(x,y)d\pi_{n}(x,y)\Big)^{2}\Big)^{1/2}\leq C(\int d^{2}(x,y)d\pi_{n}(x,y))^{1/2}=CW_{2}(\mu_{n},\mu),$
where $\pi_{n}(x,y)$ is the probability measure having marginals $\mu_{n}$ and $\mu$, where we applied the fact $\int_{X} f(x)d\mu_{n}(x)=\int_{X} f(x)d\mu_{n}(x)\int_{X} d\mu(y)=\int_{X\times X}f(x)d\mu_{n}(x,y)$, since $\mu$ is a probability measure. Similarly for $\int_{X} f(y)d\mu(y)=\int_{X \times X}f(y)d\pi_{n}(x,y)$. Also, we applied Jensen's inequality in the last inequality. The result follows as we are assuming $W_{2}(\mu_{n},\mu)\to 0.$
Remark. This is a proof I learned from the paper ``A users guide to optimal transport" by Ambrosio and Gigli.