Strong equivalence between Lévy’s metric and a topologically equivalent metric

89 Views Asked by At

Let $\mathscr B$ be the Borel $\sigma$-algebra on $\mathbb R$ and let $\mathscr P$ denote the set of all probability measures on the measurable space $(\mathbb R,\mathscr B)$.


Lévy’s metric on $\mathscr P$ is defined as $$d(\mathbb P_1,\mathbb P_2)\equiv\inf\{\varepsilon>0\,|\,\mathbb P_2(-\infty,x-\varepsilon]-\varepsilon\leq\mathbb P_1(-\infty,x]\leq\mathbb P_2(-\infty,x+\varepsilon]+\varepsilon\text{ for each $x\in\mathbb R$}\}$$ for $\mathbb P_1,\mathbb P_2\in\mathscr P$.


Now define another metric on $\mathscr P$ as follows: $$\rho(\mathbb P_1,\mathbb P_2)\equiv\sup_{t\in\mathbb R}\left\{\frac{|\varphi_1(t)-\varphi_2(t)|}{1+|t|}\right\}$$ for $\mathbb P_1,\mathbb P_2\in\mathscr P$, where $\varphi_i:\mathbb R\to\mathbb C$ is the characteristic function of $\mathbb P_i$: $$\varphi_i(t)\equiv\int_{x\in\mathbb R}\cos(tx)\,\mathrm d\mathbb P_i(x)+\mathsf{i}\int_{x\in\mathbb R}\sin(tx)\,\mathrm d\mathbb P_i(x)\quad\text{for $t\in\mathbb R$ and $i\in\{1,2\}$}.$$


It is not difficult to show that $\rho$ is a bona fide metric on $\mathscr P$. I can also prove that $d$ and $\rho$ are equivalent; that is, they generate the same topology on $\mathscr P$.


What I am curious about is whether $d$ and $\rho$ are strongly equivalent. That is, does there exist positive constants $A$ and $B$ such that $$A\rho(\mathbb P_1,\mathbb P_2)\leq d(\mathbb P_1,\mathbb P_2)\leq B\rho(\mathbb P_1,\mathbb P_2)\quad\text{ for any $\mathbb P_1,\mathbb P_2\in\mathscr P$}?$$

Any suggestions on proof strategies or on ways to construct a counterexample are much appreciated.

1

There are 1 best solutions below

0
On

I conducted some numerical experimentation, and while I have been unable to provide a rigorous disproof, the results I have obtained seem to suggest that the metrics $d$ and $\rho$ are not strongly equivalent.

To see this, let, for each $n\in\mathbb N$, $\mathbb P_n$ denote the probability measure corresponding to the normal distribution with mean $0$ and variance $1/n^2$. Then, the corresponding characteristic function is $$\varphi_n(t)=\exp\left(-\frac{t^2}{2n^2}\right)\quad\text{for each $t\in\mathbb R$}.$$ Furthermore, let $\mathbb P$ denote the unit mass at $0$, which has characteristic function $\varphi(t)=1$ for each $t\in\mathbb R$. Let $F_n$ and $F$ denote the respective corresponding distribution functions; note that $F(x)=\mathsf I_{[0,\infty)}(x)$ for $x\in\mathbb R$.

Now, $\rho(\mathbb P_n,\mathbb P)$ can be computed by maximizing the function $$t\mapsto\frac{1-\exp\left[-\large\frac{t^2}{2n^2}\right]}{1+|t|}.$$ Next, I claim that the Lévy metric $d(\mathbb P_n,\mathbb P)$ is given as the solution to the equation $$F_n(\varepsilon_n^*)=1-\varepsilon_n^*.$$ It is easy to see that this solution is unique and positive. To see that this solution gives the Lévy metric, suppose that $\varepsilon>0$ satisfies $$F_n(x-\varepsilon)-\varepsilon\leq F(x)\leq F_n(x+\varepsilon)+\varepsilon\quad\text{for each $x\in\mathbb R$}.$$ If it were the case that $\varepsilon_n^*>\varepsilon$, then, taking $x=\varepsilon_n^*-\varepsilon$, one would have $$1=F(x)\leq F_n(x+\varepsilon)+\varepsilon=F_n(\varepsilon_n^*)+\varepsilon=1-(\varepsilon_n^*-\varepsilon)<1,$$ which is a contradiction. Hence, $\varepsilon_n^*\leq \varepsilon$, so that taking infimum yields $\varepsilon_n^*\leq d(\mathbb P_n,\mathbb P)$. To show that $d(\mathbb P_n,\mathbb P)\leq\varepsilon_n^*$, it is sufficient to check that $\varepsilon_n^*$ is contained in the set over which the infimum is taken. But this is clearly the case, for if $x\geq0$, then $$F_n(x-\varepsilon_n^*)-\varepsilon_n^*\leq 1=F(x)=1-\varepsilon_n^*+\varepsilon_n^*=F_n(\varepsilon_n^*)+\varepsilon_n^*\leq F_n(x+\varepsilon_n^*)+\varepsilon_n^*,$$ whereas if $x<0$, then $$F_n(x-\varepsilon_n^*)-\varepsilon_n^*\leq F_n(-\varepsilon_n^*)-\varepsilon_n^*=0=F(x)\leq F_n(x+\varepsilon_n^*)+\varepsilon_n^*,$$ where I exploited that $F_n(-y)=1-F_n(y)$ for any $y\in\mathbb R$.

Now, numerical calculations show the following (note that $\mathbb P_n\Rightarrow\mathbb P$ weakly, so that the sequence of distances according to either metric converges to $0$): \begin{array}{r|r|r|r} n&\rho(\mathbb P_n,\mathbb P)&d(\mathbb P_n,\mathbb P)&\rho(\mathbb P_n,\mathbb P)/d(\mathbb P_n,\mathbb P)\\\hline 1&0.2883&0.3596&0.8019\\ 10&0.0425&0.1183&0.3591\\ 100&0.0045&0.0204&0.2193\\ 1\mathord,000&[\text{small}]&[\text{small}]&0.1626\\ 10\mathord,000&[\text{small}]&[\text{small}]&0.1328\\ 100\mathord,000&[\text{small}]&[\text{small}]&0.1143\\ 1\mathord,000\mathord,000&[\text{small}]&[\text{small}]&0.0000 \end{array}

I suspect that the sudden drop in the ratio from 100,000 to 1,000,000 is due to rounding error in Wolfram Mathematica, in which the calculations were performed. At any rate, I considered other examples involving more precise numerical estimates, and those, too, suggest that the ratio $\rho/d$ can be made arbitrarily small, contradicting strong equivalence.

For example, if $X$ is a random variable with density function $x\mapsto \exp(-|x|)/2$ (standard Laplace distribution), $\mathbb P_n$ is the distribution of $X/n$ ($n\in\mathbb N$), and $\mathbb P$ is again the unit mass, then I obtain the following table (and these estimates rely less on numerical-precision constraints, and more on partial analytical solutions): \begin{array}{r|r} n&\rho(\mathbb P_n,\mathbb P)/d(\mathbb P_n,\mathbb P)\\\hline 10^0&0.7874\\ 10^1&0.3439\\ 10^2&0.1730\\ 10^3&0.1069\\ 10^4&0.0755\\ 10^5&0.0577\\ 10^6&0.0465\\ 10^7&0.0388\\ 10^8&0.0333\\ 10^9&0.0291\\ 10^{10}&0.0258\\ 10^{20}&0.0120\\ 10^{30}&0.0078\\ 10^{40}&0.0058\\ 10^{50}&0.0046\\ 10^{60}&0.0038\\ 10^{70}&0.0032\\ 10^{80}&0.0028\\ 10^{90}&0.0025\\ 10^{100}&0.0022 \end{array}

Actually, Mathematica could compute the limit analytically, and it turned out to be $0$, indeed (albeit convergence is very slow, as the table shows). However, the expressions involved are so abstruse that I have refrained from double-checking the value of the limit rigorously.