Multivariate Kolmogorov distance bounded by Wasserstein distance

477 Views Asked by At

I'm trying to find a bound for the multivariate Kolmogorov distance in terms of the Wasserstein distance. Denoting by $F$ and $G$ two cumulative distribution functions (cdf) on $\mathbb{R}^n$ the Kolmogorov and Wasserstein distances between $F$ and $G$ are given by \begin{align} d_{K}(F,G) &:= \sup_{\mathbf{x} \in \mathbb{R}^n} \left| F(\mathbf{x})- G(\mathbf{x}) \right|,\\ d_{W_1}(F,G) &:= \inf_{M_{F,G}} \left\{ \int_{\mathbb{R}^{n}\times\mathbb{R}^{n}} \Vert \mathbf{x}-\mathbf{y} \Vert dM_{F,G}(\mathbf{x},\mathbf{y}); M_{F,G} \text{ is a cdf with margins } F \text{ and } G \right\}, \end{align} where $\Vert \mathbf{x} \Vert$ is a vector norm on $\mathbb{R}^n$.

If $G$ has a density $g$ bounded by $C := \sup_{x\in\mathbb{R}} g(x)$ we have in the univariate case the following bound (see Lemma 1 in https://statweb.stanford.edu/~souravc/Lecture2.pdf): \begin{align} d_{K}(F,G) \leq 2 \sqrt{C} \sqrt{d_{W_1}(F,G)}. \end{align}

In the multivariate ($n$-dimensional) case I used the same approach to get the following result. If $G$ has a density $g$ bounded by $C := \sup_{\mathbf{x}\in\mathbb{R}^n} g(x)$ and $G$ is compactly supported on a set $D \subsetneq \mathbb{R}^n$ where $K = \max_{i=1,\ldots,n} \{ \sup_{\mathbf{x,y}\in D}|x_i-y_i| \}$ is the longest possible distance in one direction in $D$ then \begin{align} d_{K}(F,G) \leq 2 \sqrt{CnK^{n-1}} \sqrt{d_{W_1}(F,G)}. \end{align}

However, I would like to get rid of the bounded support assumption for $G$ (density, smoothness or moment assumptions for $F$ and $G$ would be fine) and I was wondering if anyone knows of a less restrictive result of the same flavor?

1

There are 1 best solutions below

1
On

The following result for the two-dimensional case can be straightforwardly generalised to higher dimensions.

Let $Q=\displaystyle\sup_{\bf{x}\in \mathbb{R}^2} \left\{ \int_0^x g(t,y)\mathrm{d}t+\int_0^y g(x,t) \mathrm{d}t\right\}$

Then $d_K(F,G)\le \sqrt{2Q}\sqrt{d_{W_1}}$

Proof:

Let the point $(x,y)$ be chosen to maximise $F(x,y)-G(x,y)$. Imagine that $f$ and $g$ are two different ways of distributing soil over the plane. Then the distance $d_K(F,G)$ is the difference between $f$ and $g$ in the amount of soil which is contained in the region $ S=\{(t_1,t_2):t_1 \le x, t_2\le y\}$.

The Wasserstein distance $d_{W_1}$ is the amount of soil that needs to be moved to transform $f$ into $g$ multiplied by the average distance that the soil needs to be moved.

If all of the soil which needs to be moved across the boundary of $S$ is right next to the boundary than the Wasserstein distance can be small. However, soil will have to be removed from $g$ on one or the other side of the boundary of $S$.

$Q$ is an upper bound on the 'cross-section' of $g$ near the boundary of $S$ and it can be deduced that a volume $\frac{d_K}2$ of earth must be spread out in such a way that it needs to be moved an average distance of at least $\frac{d_K}{2Q}$ to cross the boundary of $S$.

Therefore, $d_{W_1}\ge \frac{d_K^2}{4Q}$.