$W_p(\mu,\mu_{\epsilon})\to 0$ when $\mu_{\epsilon}$ is a mollified version of $\mu$.

219 Views Asked by At

I would like to show that the Wasserstein distance between a probability measure $\mu$ (in $\mathbb{R}^d$) and a mollified version of it $\mu_{\epsilon}:=\mu\ast\rho_\epsilon$ where $\mu\ast\rho_\epsilon(x):=\int \rho_{\epsilon}(x-y)d\mu(y)$ goes to 0, i.e. $$\lim\limits_{\epsilon\to 0}W_p(\mu,\mu_{\epsilon}).$$ I was told that to do this I can show that $$W_p(\mu,\mu_{\epsilon})\le\epsilon m_p(\rho)$$ where $m_p(\rho):=\left(\int d(x,x)^p\rho(x)dx\right)^{1/p}$ denotes the $p$-th moment. I think I have to come up with a particular transport plan between $\mu$ and $\nu$. I tried with the most obvious one to me, i.e. the product measure $\mu\times\mu_{\epsilon}$ but I didn't arrive to the desired conclusion.

edit

Using the plan $\mu\times\mu_{\epsilon}$ I get \begin{align} W_p^p(\mu,\mu_{\epsilon})&\le \int d(x,y) d\mu(x)d\mu_\epsilon(y)\\ &= \int d(x,y)\mu\ast\rho_{\epsilon}(y)d\mu(x) dy\\ &= \int d(x,y)\left(\int\rho_{\epsilon}(y-z)d\mu(z)\right)d\mu(x) dy \end{align} But now I a am stuck

1

There are 1 best solutions below

9
On BEST ANSWER

The problem with using the product measure $\mu \otimes \mu_{\epsilon}$ is it does not use the form of $\mu_{\epsilon}$ at all. Convolutions have a very specific probabilistic interpretation, as do couplings, and the $p$-th moment you wrote also frequently appears in probabilistic computations. Hence a second guess is to exploit this connection.

Recall that if $Y$ is a random vector with distribution $\mu$ and $X_{\epsilon}$ is a random variable with distribution $\rho_{\epsilon}(x) \, dx$, and if $Y$ and $X_{\epsilon}$ are independent, then $X_{\epsilon} + Y$ has distribution $\rho_{\epsilon} * \mu = \mu_{\epsilon}$. Accordingly, if $\nu$ is the distribution of the pair $(X_{\epsilon} + Y,Y)$, then the first marginal $\pi_{1*} \nu$ equals $\mu_{\epsilon}$ and the second marginal $\pi_{2*}\nu$ equals $\mu$. In particular, $\nu$ is a coupling of $\mu_{\epsilon}$ and $\mu$.

Also, I'm assuming that $d(x,y) = \|x - y\|$, where $\|\cdot\|$ is the Euclidean norm. However, all that is really necessary is that $d$ is translationally invariant (i.e. $d(x + z, y + z) = d(x,y)$ for all $x,y, z \in \mathbb{R}^{d}$). In particular, other norms would also work.

With the coupling $\nu$ in hand, let's try to bound $W_{p}(\mu_{\epsilon},\mu)$. I will use the random variables to do the computation since it is much clearer. If that's not your style, it would be a good exercise to write down a purely analytic proof. We find: \begin{align*} W_{p}(\mu_{\epsilon},\mu)^{p} &\leq \int_{\mathbb{R}^{d} \times \mathbb{R}^{d}} \|x - y\|^{p} \, \nu(dx \otimes dy) \\ &= \mathbb{E}(\|(X_{\epsilon} +Y) - Y\|^{p}) \\ &= \mathbb{E}(\|X_{\epsilon}\|^{p}) \\ &= \int_{\mathbb{R}^{d}} \|x\|^{p} \rho_{\epsilon}(x) \, dx \\ &= \epsilon^{p} \int_{\mathbb{R}^{d}} \|\epsilon^{-1} x\|^{p} \rho(\epsilon^{-1}x) \, \frac{dx}{\epsilon^{d}} \\ &= \epsilon^{p} M_{p}(\rho), \end{align*} where the $p$th moment of $\rho$ is $M_{p}(\rho) = \int_{\mathbb{R}^{d}} \|y\|^{p} \rho(y) \, dy$.

It is worth noting (and can be used to simplify the computation above somewhat) that $X_{\epsilon}$ has the same distribution as $\epsilon X$, where $X$ has distribution $\rho(x) \, dx$. In this sense, what we are doing in the last three lines above is simply computing $\mathbb{E}(\|\epsilon X\|^{p}) = \epsilon^{p} \mathbb{E}(\|X\|^{p})$ --- another indication that the probabilistic interpretation is very convenient here.

Hint to "un-probability" the computation above: Let $\nu = \nu_{\epsilon}$ be the probability measure on $\mathbb{R}^{d} \times \mathbb{R}^{d}$ given by its action on $f \in C_{c}(\mathbb{R}^{d} \times \mathbb{R}^{d})$ by \begin{equation*} \int_{\mathbb{R}^{d} \times \mathbb{R}^{d}} f(x,y) \nu_{\epsilon}(dx \otimes dy) = \int_{\mathbb{R}^{d}} \left( \int_{\mathbb{R}^{d}} f(y' + x',y') \rho_{\epsilon}(x') \, dx' \right) \mu(dy'). \end{equation*} Check that $\nu_{\epsilon}$ as defined is really a coupling of $\mu$ and $\mu_{\epsilon}$. It is also worth verifying that $\nu_{\epsilon}$ can be rewritten as \begin{equation*} \int_{\mathbb{R}^{d} \times \mathbb{R}^{d}} f(x,y) \nu_{\epsilon}(dx \otimes dy) = \int_{\mathbb{R}^{d}} \left( \int_{\mathbb{R}^{d}} f(y' + \epsilon x', y') \rho(x') \, dx' \right) \mu(dy'). \end{equation*} Hence $\nu_{\epsilon}$ is looking very close to the diagonal. Finally, check that \begin{equation*} \int_{\mathbb{R}^{d} \times \mathbb{R}^{d}} \|x-y\|^{p} \nu_{\epsilon}(dx \otimes dy) = \int_{\mathbb{R}^{d}} \|x'\|^{p} \rho_{\epsilon}(x') \, dx' \end{equation*} and complete the argument as in the answer above.

Edit: The product measure will not give a useful estimate. To see this, consider the case when $\mu$ is Gaussian, that is, $\mu(dx) = (2 \pi)^{-d/2}\exp(-\|x\|^{2}/2) \, dx$ and let $\rho$ also be Gaussian: \begin{equation*} \rho(x) = (2 \pi)^{-d/2} \exp(-\|x\|^{2}/2). \end{equation*} One can then check that $\mu \otimes \mu_{\epsilon}$ is nothing other than two independent Gaussians, the second with variance $1 + \epsilon^{2}$: \begin{equation*} (\mu \otimes \mu_{\epsilon})(dx \otimes dy) = (2 \pi)^{-d} \epsilon^{-d} \exp \left(- \|x\|^{2}/2 - \|y\|^{2}/2(1+\epsilon^{2})\right) \, dx dy. \end{equation*} Now in the easy case when $p = 2$, writing $\|x - y\|^{2} = \|x\|^{2} + \|y\|^{2} - 2 \langle x,y \rangle$ and eliminating the cross term leads to \begin{equation*} \int_{\mathbb{R}^{d} \times \mathbb{R}^{d}} \|x - y\|^{2} (\mu \otimes \mu_{\epsilon})(dx \otimes dy) = \int_{\mathbb{R}^{d}} \|x\|^{2} \mu(dx) + \int_{\mathbb{R}^{d}} \|y\|^{2} \mu_{\epsilon}(dy) = 2 + \epsilon^{2}. \end{equation*}