Taken from paper "A Universal Law of Robustness via isoperimetry" by Bubeck and Sellke.
Theorem 3. Let $\mathcal{F}$ be a class of functions from $\mathbb{R}^{d} \rightarrow \mathbb{R}$ and let $\left(x_{i}, y_{i}\right)_{i=1}^{n}$ be i.i.d. input-output pairs in $\mathbb{R}^{d} \times[-1,1]$. Fix $\epsilon, \delta \in(0,1)$.
Assume that:
The function class can be written as $\mathcal{F}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}\right\}$ with $\mathcal{W} \subset \mathbb{R}^{p}$, $\operatorname{diam}(\mathcal{W}) \leq W$ and for any $\boldsymbol{w}_{1}, \boldsymbol{w}_{2} \in \mathcal{W}$, $$ \left\|f_{\boldsymbol{w}_{1}}-f_{\boldsymbol{w}_{2}}\right\|_{\infty} \leq J\left\|\boldsymbol{w}_{1}-\boldsymbol{w}_{2}\right\| $$
The distribution $\mu$ of the covariates $x_{i}$ can be written as $\mu=\sum_{\ell=1}^{k} \alpha_{\ell} \mu_{\ell}$, where each $\mu_{\ell}$ satisfies c-isoperimetry.
The expected conditional variance of the output is strictly positive, denoted $\sigma^{2}:=\mathbb{E}^{\mu}[\operatorname{Var}[y \mid x]]>0$.\newline
Then, with probability at least $1-\delta$ with respect to the sampling of the data, one has simultaneously for all $f \in \mathcal{F}$ : $$ \frac{1}{n} \sum_{i=1}^{n}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \leq \sigma^{2}-\epsilon \Rightarrow \operatorname{Lip}(f) \geq \frac{\epsilon}{2^{9} \sqrt{c}} \sqrt{\frac{n d}{p \log \left(60 W J \epsilon^{-1}\right)+\log (4 / \delta)}} . $$
Proof. Define $\mathcal{W}_{L} \subseteq \mathcal{W}$ by $\mathcal{W}_{L}=\left\{\boldsymbol{w} \in \mathcal{W}: \operatorname{Lip}\left(f_{\boldsymbol{w}}\right) \leq L\right\}$. Denote $\mathcal{W}_{L, \epsilon}$ for an $\frac{\epsilon}{6 J}$-net of $\mathcal{W}_{L}$. We have in particular $\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$. We apply Theorem 2 to $\mathcal{F}_{L, \epsilon}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}_{L, \epsilon}\right\}$ :
This continues .........
I feel curious to know the size of a $\epsilon$-net Vector Space.
How $\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$???
Since $\mathcal{W}_{L} \subset \mathcal{W}$ and $\mathrm{diam}(\mathcal{W}) \leq W$, then $\mathcal{W}_{L}$ is contained in a ball of radius $W$. Without loss of generality, let's assume this ball is centered around $0$. Furthermore, denote $r \mathbf{B}_2^d$ for the Euclidean ball of radius $r$ in $d$ dimensions.
The $\varepsilon$-covering number of a Euclidean ball of radius $W$ in $p$ dimensions at most (see Chapter 4.2 in Roman Vershynin's High-Dimensional Probability):
$$ \mathcal{N}(W\mathbf{B}_2^p, \epsilon) \leq \left(1 + \frac{2W}{\varepsilon}\right)^p \leq \left( \frac{3 W}{\varepsilon 2} \right)^p $$
Substituting $\varepsilon := \frac{\epsilon}{6J}$ above gives you
$$ \mathcal{N}\left(W\mathbf{B}_2^p, \frac{\epsilon}{6J}\right) \leq \left(\frac{9 J W}{\epsilon} \right)^p. $$
Finally, use approximate monotonicity of covering numbers (Exercise 4.2.10 in the book linked above) to deduce
$$ \mathcal{W}_{L} \subset W \mathbf{B}_2^p \Rightarrow \mathcal{N}(\mathcal{W}_{L}, \epsilon / 6J) \leq \mathcal{N}(W\mathbf{B}_{2}^p, \epsilon / 12J) \leq \left( \frac{18 J W}{\epsilon} \right)^p. $$