I am self-studying empirical process theory. I have encountered the covering number $N(\delta,\mathcal{G},P)$, as well as the empirical version $N(\delta,\mathcal{G},P_n)$. It seems intuitive to expect some kind of convergence: $$ N(\delta,\mathcal{G},P_n)\rightarrow N(\delta,\mathcal{G},P) $$ Yet, I have no idea how to prove this. Can such a result be shown? Or are there counterexamples?
Definitions
Covering number: Let $P$ be a probability measure on the Borel-$\sigma$-algebra over $\mathbb{R}$. For $p\in[1,\infty)$ let $L^p(P)$ be the set of Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$, for which $\int_\mathbb{R} |f|^p dP<\infty$. Let $\mathcal{G}$ be a totally bounded subset of $L^p(P)$. For some $\delta>0$, we can define the covering number of $\mathcal{G}$ as the smallest $N\in\mathbb{N}$, such that there exists a finite subset $G\subset \mathcal{G}$ with the following property: For any $g\in\mathcal{G}$, there exists a $h\in G$, such that $||g-h||_p<\delta$. This number is denoted by $N(\delta,\mathcal{G},P)$.
Empirical measure: Let $P$ be as above. Let $\{X_n\}_{n\in\mathbb{N}}$ be a sequence of independent $P$-distributed random variables. If $\delta_{X_i}$ denotes the dirac-measure, the empirical measure $P_n$ is defined as: $$ P_n:\mathcal{B}(\mathbb{R})\rightarrow[0,1],\quad E\mapsto \frac{1}{n}\sum_{i=1}^n\delta_{X_i}(E) $$
The answer is no, due to problems with equivalence classes. We will look at two cases: First, the elements of $L^p(P)$ are understood as equivalence classes ($f=g$ if $P(f(x)=g(x))=1$), then we take them as individual functions.
Case 1: Interpreting $L^p(P)$ as a set of equivalence classes.
In this case, calculating the empirical covering number makes no sense, unless $P$ is discrete.
To demonstrate that this is nonsense, let $\mathcal{B}$ be the set of all Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$. take $P$ as the Lebesgue measure on $[0,1]$ and $\mathcal{G}=\{[0]\}$. Here, $[0]$ is the set of all Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$, which are $P$-almost surely equal to zero. Take a finite subset $\{x_1,\dots,x_n\}\subset \mathbb{R}$. For any $g\in\mathcal{B}$, $$ x\mapsto \sum_{i=1}^n g(x)1\{x=x_i\}\in[0] $$ Therefore, for any $\delta>0$ and any $n\in\mathbb{N}$, the element $[0]$ can cover all of $\mathcal{B}$ in the empirical $L^p$-norm: $$ N(\delta,\mathcal{B},P_n)=1 $$ Trivially, $$ N(\delta,\mathcal{B},P)=+\infty $$ In fact $\mathcal{B}$ is so large that we cannot even define a countable covering for it.
Case 2: Interpreting $L^p(P)$ as a set of functions.
In this case, there is also a counterexample based on equivalence classes. But now it goes the other way.
Let $\mathcal{G}$ be the set functions parametrized by $\alpha\in [0,1]$, $\beta>0$, which map: $$ g_{\alpha,\beta}:[0,1]\rightarrow\mathbb{R},\quad x\mapsto \beta 1\{x=\alpha\} $$ Let $P$ be the Lebesgue measure on $[0,1]$. For any $g,h\in\mathcal{G}$, it holds that $g=h$, $P$-almost surely. So, for any $\delta>0$, $$ N(\delta,\mathcal{G},P)=1 $$ At the same time, suppose we have $n=1$ and we observe $x_1$. The empirical distance between elements of $\mathcal{G}$ with $\alpha=x_1$ is unbounded: $$ ||g_{x_1,\beta_1}-g_{x_1,\beta_2}||_{P_1}=|\beta_11\{x=x_1\}(x_1)-\beta_21\{x=x_1\}(x_1)|=|\beta_1-\beta_2| $$ So, $$ N(\delta,\mathcal{G},P_1)=+\infty,\quad\text{$P$-almost surely} $$ It is easy to see that the same holds for any $n$: $$ N(\delta,\mathcal{G},P_n)=+\infty,\quad\text{$P$-almost surely} $$