Prove a large number of samplings converge to a set of small number of samples

76 Views Asked by At

I don't know how to start with this problem.


UPDATE:

Thanks for the comments from @BruceET and @felipeh, I update the problem as follows:

In a $k$-dimensional space, given a set of $n$ points: $X=\{x_1, x_2,..., x_n\}$ which is sampled from an known distribution $P$. Given a set of $m$ points ($m << n$): $Y=\{y_1,y_2,...,y_m\}$ which is also drawn from $P$. Suppose $\{z_1, z_2,...,z_m\} \subset X$ are the closest points (in term of Euclidean distance) of $\{y_1,y_2,...,y_m\}$ respectively. A gap $g$ is defined as follows:

$g = max\{|z_1 - y_1|^2, |z_2 - y_2|^2,..., |z_m - y_m|^2\}$

where $|a-b|^2$ denotes the Euclidean distance between a, b.

Prove that if $m$ is fixed, $g \rightarrow 0$ as $n \rightarrow \infty$.


Thanks @Henry for the comment, I prove it as follows, could you please check whether it is correct? I do it in a uniform distribution.

Suppose $g = max\{|z_1 - y_1|^2, |z_2 - y_2|^2,..., |z_m - y_m|^2\} = \| z - y \|_{max}$.

For any $\varepsilon > 0$, put a hypersphere radius $\varepsilon$ around $y$ such that $y$ is the centroid. (*)

Suppose there is no sample of $X$ exists in the hypersphere as $n \rightarrow \infty$. (**)

We cut the space by a line $d$ through $y$. Then, we project samples of $X$, which distance to $d$ is less than $\delta$ ($0 < \delta \leq \varepsilon$), on $d$ and denote these projected samples as $x_d$. Suppose the boundary of $d$ is $[a, b]$, i.e. $a \leq x_d \leq b$. According to the law of large numbers theorem, $\overline{x_d}$ (the average of samples projected on $d$) $\rightarrow (a + b)/2$ as $n \rightarrow \infty$. Now if there is no sample in the hypersphere:

\begin{equation} \begin{split} \overline{x_d} \rightarrow \int_a^b \mathrm{\frac{x}{b-a}}\,\mathrm{d}x = \int_a^{y_d - \delta} \mathrm{\frac{x}{b-a}}\,\mathrm{d}x + \int_{y_d + \delta}^b \mathrm{\frac{x}{b-a}}\,\mathrm{d}x \\ = \frac{a+b}{2} - \frac{2\delta y_d}{b - a} \neq \frac{a + b}{2} \end{split} \end{equation} The above equation conflicts the law of large numbers theorem, this indicates that (**) is false; so there exists at least one sample in the hyperphere. This statement accompanies with (*) indicate that $g = \| z - y \|_{max} \rightarrow 0$ as $n \rightarrow \infty$.