I am reading Villani's Optimal transport: Old and new.
Theorem 4.1 concerns the existence of an optimal coupling between any two Polish probability spaces $(\mathcal{X}, \mu)$ and $(\mathcal{Y}, \nu)$. The proof begins as follows (note that $P(\mathcal{X})$ denotes the set of probability distributions on $\mathcal{X}$, and $\Pi(\mu, \nu)$ the set of all joint distributions on $\mathcal{X} \times \mathcal{Y}$ with marginals $\mu$ and $\nu$ respectively):
Since $\mathcal{X}$ is Polish, $\{\mu\}$ is tight in $P(\mathcal{X})$; similarly, $\{\nu\}$ is tight in $P(\mathcal{Y})$. By Lemma 4.4, $\Pi(\mu, \nu)$ is tight in $P(\mathcal{X} \times \mathcal{Y})$, and by Prokhorov's theorem this set has a compact closure. By passing to the limit in the equation for marginals, we see that $\Pi(\mu, \nu)$ is closed, so it is in fact compact.
I am trying to understand the last claim ("By passing to the limit...") - it is unclear what "equation for marginals" is referred to here. Essentially, the thing I want to show is that $\Pi(\mu, \nu)$ is closed in the weak topology.
The best I can do is as follows: suppose we have $\pi_k \to \pi$ in the weak topology, with each $\pi_k \in \Pi(\mu, \nu)$. Then for any $\mu$-measurable $A$, we have $\pi_k(A \times \mathcal{Y}) = \mu(A)$ (and similarly for $\nu$), so we would be done if we have $\pi_k(A \times \mathcal{Y}) \to \pi(A \times \mathcal{Y})$.
However, as far as I understand, weak convergence of $\pi_k$ to $\pi$ only ensures $\pi_k(B) \to \pi(B)$ when $\pi(\partial B) = 0$ (with $\partial B$ denoting the boundary of $B$). Presumably there will be $A$ such that $A \times \mathcal{Y}$ does not satisfy this, for instance.
Can anyone see how to make sense of this "passing to the limit" step, or how else to obtain this result?
I think it is easiest to prove it this way:
Now in your setup, let $f : \mathcal{X} \to \mathbb{R}$ be an arbitrary bounded continuous function. By assumption we have $\int_{\mathcal{X} \times \mathcal{Y}} f(x)\,\pi_k(dx,dy) = \int_{\mathcal{X}} f(x)\,\mu(dx)$ for each $k$. Letting $k \to \infty$ and using Proposition 1, we have $\int_{\mathcal{X} \times \mathcal{Y}} f(x)\,\pi(dx,dy) = \int_{\mathcal{X}} f(x)\,\mu(dx)$ as well. Now Proposition 2 tells us $\pi(\cdot \times \mathcal{Y}) = \mu$.
Here is another way:
Proof. Use the Dynkin $\pi$-$\lambda$ lemma. Let $\mathcal{L} = \{B \in \mathcal{B} : \mu(B) = \nu(B)\}$, where $\mathcal{B}$ is the Borel $\sigma$-algebra; it is routine to show that $\mathcal{L}$ is a $\lambda$-system.
Now let $\mathcal{P} = \{ B \in \mathcal{B} : \mu(\partial B) = \nu(\partial B) = 0\}$. By assumption, $\mathcal{P} \subset \mathcal{L}$. For $B_1, B_2 \in \mathcal{P}$, we have $\partial(B_1 \cap B_2) \subset \partial B_1 \cup \partial B_2$ and thus $B_1 \cap B_2 \in \mathcal{P}$, so $\mathcal{P}$ is a $\pi$-system.
We have to show that $\sigma(\mathcal{P}) = \mathcal{B}$. Fix a compatible metric; it suffices to show that every open ball is in $\sigma(\mathcal{P})$. Fix $x \in X$ and let $B(x,r)$ denote the open ball of radius $r$ centered at $x$. Note that the boundaries $\partial B(x,r), r > 0$ are pairwise disjoint, so the set $R_x^\mu = \{r > 0 : \mu(\partial B(x,r)) > 0\}$ is at most countable, and the same for $R_x^\nu$. So the set $S = (0,\infty) \setminus (R_x^\mu \cup R_x^\nu)$ is dense in $(0, \infty)$. Hence for any $r > 0$ we can find $r_n \in S$ with $r_n \uparrow r$. Then $B(x,r_n) \in \mathcal{P}$ and $\bigcup_n B(x,r_n) = B(x,r)$. So $B(x,r) \in \sigma(\mathcal{P})$, where $x,r$ were arbitrary.
By Dynkin's lemma we conclude that $\mathcal{B} \subset \mathcal{L}$ which is to say that $\mu = \nu$.