If $(E,d)$ is compact and $(f_k)⊆C(E)$ is dense, then $\text E[1∧d(X_n,X)]\to0$ iff $\|f_k(X_n)-f_k(X)\|_{L^1}\to0$ for all $k∈ℕ$

114 Views Asked by At

Let $(E,d)$ be a compact metric space and $(f_k)_{k\in\mathbb N}\subseteq C(E)$ be dense. We can show that $$d(x_n,x)\xrightarrow{n\to\infty}0\Leftrightarrow\forall k\in\mathbb N:f_k(x_n)\xrightarrow{n\to\infty}f_k(x)\tag1$$ for all $(x_n)_{n\in\mathbb N}\subseteq E$ and $x\in E$.

Now let $(\Omega,\mathcal A,\operatorname P)$ be a probability space. Are we able to show that $$\operatorname E\left[1\wedge d(X_n,X)\right]\xrightarrow{n\to\infty}0\Leftrightarrow\forall k\in\mathbb N:\left\|f_k(X_n)-f_k(X)\right\|_{L^1(\operatorname P)}\xrightarrow{n\to\infty}0\tag2$$ for all $(\mathcal A,\mathcal B(E))$-measurable $X_n,X:\Omega\to E$ for $n\in\mathbb N$?

I wonder if this is somehow immediate from $(1)$. But since I don't see how, I've tried to mimic the proof of $(1)$. So, let's consider the "$\Leftarrow$" direction in $(2)$: As in the proof of $(1)$, we are able to show that $$\left\|f(X_n)-f(X)\right\|_{L^1(\operatorname P)}\xrightarrow{n\to\infty}0\tag3$$ for all $f\in C(E)$.

However, in the proof of the $\Leftarrow$ direction of $(1)$, we would now note that, given $\varepsilon>0$, $$f:=\frac{d\left(\;\cdot\;,{B_\varepsilon(x)}^c\right)}{d\left(\;\cdot\;,{B_\varepsilon(x)}^c\right)+d\left(\;\cdot\;,\overline B_{\varepsilon/2}(x)\right)}$$ is a Urysohn function for ${B_\varepsilon(x)}^c$ and $\overline B_{\varepsilon/2}(x)$. Now by the result corresponding to $(3)$, $$f(x_n)\xrightarrow{n\to\infty}f(x)=1\tag4$$ and hence $$f(x_n)>0\;\;\;\text{for all }n\ge N\tag5$$ for some $N\in\mathbb N$ which yields $$x_n\in B_\varepsilon(x)\;\;\;\text{for all }n\ge N.\tag6$$

In the probabilistic setting there is no immediate analogue to this approach. So, how do we need to proceed? Do we even need to start from scratch or is $(2)$ somehow obvious from $(1)$?

EDIT: Maybe we need to replace the condition on the right-hand side of $(2)$ by $$\sum_{k\in\mathbb N}a_k\left(1\wedge\left\|f_k(X_n)-f_k(X)\right\|_{L^1(\operatorname P)}\right)\xrightarrow{n\to\infty}0\tag7$$ for some $(a_k)_{k\in\mathbb N}\subseteq(0,\infty)$ with $\sum_{k\in\mathbb N}a_k<\infty$.

We may at least note that $$\sum_{k\in\mathbb N}a_k\operatorname E\left[1\wedge\left|f_k(X_n)-f_k(X)\right|\right]=\operatorname E\left[\rho(X_n,X)\right]\tag8$$ by the dominated convergence theorem for all $n\in\mathbb N$, where $$\rho(x,y):=\sum_{k\in\mathbb N}a_k(1\wedge|f_k(x)-f_k(y)|)\;\;\;\text{for }x,y\in E$$ is a metric on $E$ equivalent to $d$.

Now we may note the following: Writing $2(a\wedge b)=a+b-|a-b|$ for all $a,b\in\mathbb R$, we see that $2\operatorname E[1\wedge Y]=1+\operatorname E[Y]-\operatorname E[|1-Y|]$ and $2(1\wedge\operatorname E[Y])=1+\operatorname E[Y]-|1-\operatorname E[Y]|$ for all real-valued random variables $Y$ on $(\Omega,\mathcal A,\operatorname P)$. But since $|1-\operatorname E[X]|=|\operatorname E[1-Y]|\le\operatorname E[|1-Y|]$, we can conclude that $\operatorname E[1\wedge X]-1\wedge\operatorname E[X]\le0$.

This yields $$\operatorname E[\rho(X_n,X)]\le\sum_{k\in\mathbb N}a_k(1\wedge\operatorname E\left[|f_k(X_n)-f_k(X)|\right]\xrightarrow{n\to\infty}0\tag9$$ and hence $$\rho(X_n,X)\xrightarrow{n\to\infty}0\;\;\;\text{in probability}.\tag{10}$$ This would be enough to conclude (by the dominated convergence theorem) if we could replace $\rho$ by $d$ in $(10)$. While these metrics are equivalent (i.e. they generate the same topology), I need to admit that I'm not sure if convergence in probability depends on the metric chosen (it should be consistent for strongly equivalent metrics though).

EDIT 2: I guess convergence in probability does only depend on the generated topology, but does anyone has a reference?

1

There are 1 best solutions below

6
On BEST ANSWER

The $\Rightarrow$ part of (2) seems to be easy. We have $d(X_n,X)\to 0$ in probability. Look at the sequence of pairs $Y_n = (X_n,X)$ taking values in $(E\times E, d_2)$, where $d_2((r,s),(t,u))=d(r,t)+d(s,u)$, and $Y=(X,X)$. We have $d_2(Y_n,Y)\to0$ in probability, so the distribution of $Y_n$ converges weakly to the distribution of $Y$. That is, for any $g\in C(K\times K)$ we have $Eg(Y_n)\to Eg(Y)$. In particular, for $g$ of form $g(s,t)=|f_k(s)-f_k(t)|$, delivering the right-hand side of (2).

As for the $\Leftarrow$ direction. A hint from Augustos Santos suggested this argument. First, the density of the $f_k$ in $C(E)$ means that the convergence $E|f(X_n)-f(X)|\to 0$ holds for all $f\in C(E)$ by a standard $\epsilon/2$ argument: $f$ is $\epsilon/4$-approximated by some $f_k$, and for all $n$ sufficiently large, $E|f_k(X_n)-f_k(X)|<\epsilon/2$. Let $g_k$ be dense in the set of all Lipschitz functions in $C(E)$ with Lipschitz constant $1$, so that $|g_k(x)-g_k(y)|\le d(x,y)$ on $E\times E$. Now define $\delta(x,y)=\sum_k 2^{-k}|g_k(x)-g_k(y)|$. Clearly $\delta(x,y)\le d(x,y)$, so $d(x_n,n)\to0$ implies $\delta(x_n,x)\to 0$. Suppose now that $\delta(x_n,x)\to 0$. Then, for every $k$ we have $g_k(x_n)\to g_k(x)$. But by compactness of $(E,d)$, subsequences of the $x_n$ have limit points. Suppose there are two distinct ones, $a\ne b$, for which $d(a,b)>0$. The continuous function $x\mapsto d(x,a)$ separates them, so there is some $g_k$ for which $g_k(a)\ne g_k(b)$. Since $\delta(x_n,x)\to 0$ we have, for that $k$, $g_k(x_n)\to g_k(x)$ along the full sequence. And on one subsequence, $g_k(x_n)\to g_k(a)$ and on another, $g_k(x_n)\to g_k(b)$. So we have both $g_k(a)=g_k(x)$ and $g_k(b)=g_k(x)$, which is a contradiction: there cannot be two distinct $d$-limit points of $x_n$. So $d(x_n,x)\to 0$. That is, $\delta$ and $d$ are equivalent metrics.

Then the right-hand side of (2) implies $\delta(X_n,X)\to 0$ in probability. Which (by the OP's EDIT 2 remark) means $X_n\to X$ in probability. (See Kallenberg, Foundations Of Modern Probability, 2nd ed., chapter 4.)