Proof that the discrete probability measures are dense in the space of all Borel probability

1.4k Views Asked by At

In the context of Bayesian nonparametrics, I want to prove that the discrete probability measures with finite support are dense in the space of all Borel probability measures on a Polish space (or $\mathbb{R}^d$), relative to the weak topology. According to my notes, this is the topology of convergence in distribution (I haven't worked much with this topology before).

I could't find a proof of this proposition online. A very similar question was asked here: In the space of probability distributions, is the set of discrete distributions dense?, but I can't access the book that is referenced.

So given a measure $\mu$ I want to construct a sequence of discrete measures with finite support $\mu_n$ such that for for all $x$ where $\mu$ is continuous, we have:

$$F_n(x) \to F(x),$$

where $F_n$ and $F$ denote the cumulative probability functions corresponding to $\mu_n$ and $\mu$.

I think I have a proof in the case of $\mathbb{R}$:

For $n\in\mathbb{N}$, define support points $p_1,\dots,p_{n-1}$ by $p_j = F^{-1}(\frac{j}{n})$, and set $\mu_n(A) = \frac{1}{n-1}\sum_{j=1}^{n-1} \mathbb{1}_A(p_j)$, where $\mathbb{1}_A$ denotes the indicator function. Now, (assuming $F$ is continuous at $p_j$), $F_n(p_j) = F(p_j)$. At all other points $x\in[p_1,p_{n-1}]$ we have $F(x) - F_n(x) \le F(p_{k+1}) - F(p_k) = \frac{1}{n}$, where $p_k \le x < p_{k+1}$, and the difference is similarly bounded in the tails. I will need to add a few lines to account for discontinuities in $F$ but otherwise I think this works.

Is this the right way to go? And I have trouble extending this approach to more dimensions. Any suggestions, or pointers to an actual proof of this theorem?

1

There are 1 best solutions below

1
On BEST ANSWER

There is a proof of a stronger theorem in

Varadarajan, V. S. "On the convergence of sample probability distributions." Sankhyā: The Indian Journal of Statistics (1933-1960) 19.1/2 (1958): 23-26.

Theorem: Let $\mu$ be a probability measure on a separable metrizable space $X$ and let $\langle g_n\rangle$ be an independently identically distributed sequence of random variables, each with distribution $\mu$. Then almost surely, the sequence of (random!) sample distributions $\langle \mu_n\rangle$ given by $$\mu_n(B)=n^{-1}\#\{m:m\leq n, g_n\in B\}$$ converges weakly to $\mu$.

The hard part of the proof is showing that there exists a countable family $\mathcal{C}$ of bounded continuous functions on $X$ such that a sequence $\langle \nu_n\rangle$ converges weakly to $\nu$ if and only if $\langle\int f~\mathrm d\nu_n\rangle$ converges to $\int f~\mathrm d\nu$ for all $f\in\mathcal{C}$. This part is essentially equivalent to showing that the topology of weak convergence has a countable basis in this case.

To finish the proof, note that for the sequence of sample distributions $\langle \mu_n\rangle$ we have almost surely $\lim_n \int f~\mathrm d\mu_n=\int f~\mathrm d\mu$ for every bounded continuous function $f$ by the strong law of large numbers. So for each $f\in\mathcal{C}$, this holds outside a set of measure zero and since the countable union of measure zero sets is again a set of measure zero, the theorem holds.