Distance between empirical distribution and population

59 Views Asked by At

Suppose we have $N$ points $z_1, \dots, z_N$. From this we can form the empirical distribution $\mu_z = \frac{1}{N} \sum_{i=1}^N \delta_{z_i}$.

Now suppose we sample $N$ i.i.d. points from $\mu_z$, i.e. $y_i \sim \mu_z$ for all $i=1,\dots, N$. The set of points $\{y_i\}_{i=1}^N$ will be contained in the support as the points $\{z_i\}_{i=1}^N$, and there may be repeats and omissions of points that occurred in the set $\{z_i\}_{i=1}^N$.

Let $\mu_y = \frac{1}{N} \sum_{i=1}^N \delta_{y_i}$ be the empirical distribution for the set of points $\{y_i\}_{i=1}^N$.

My question is, can we bound the Wasserstein distance between $\mu_y$ and $\mu_z$? The expectation of the Wasserstein distance? What conditions on the support of $\{z_i\}_{i=1}^N$ would be relevant? It makes sense that the farther the spread, the larger the bound/expectation would be. I also know that the distance between an empirical distribution of $N$ samples from a population distribution is $O(1/\sqrt{N})$ (from this question). But I believe that bound has some assumptions on the population distribution that $\mu_z$ may not satisfy.