Clarification about the $\epsilon$-net argument

810 Views Asked by At

I have been reading this paper : https://openreview.net/pdf?id=BJehNfW0-

In Corollary D.1, they reference this paper : https://arxiv.org/pdf/1703.00573.pdf which in Theorem B.2 constructs an $\epsilon$-net to get an upper bound for an expectation.

I have little background in topology so I'm not sure what the authors mean when they say they use "standard constructions" to obtain $\log |X| \leq O(p\log(LL\varphi p/ε))$ (in theorem B.2).

I'm also unsure as to where they get the probability $1 − \exp(−p)$ required for the proof from. Could someone more knowledgable clear this up for me?

Thanks in Advance.

1

There are 1 best solutions below

2
On

Note the Chernoff bound they have at the bottom of page 20. This in addition to the union bound using the control on $|\mathcal{X}|$ they offer will give the control \begin{align*} P(\exists v \in \mathcal{X} \textrm{ s.t. \{stuff\}}) &\le \exp( \log (2|\mathcal{X}|) - \epsilon^2m/2\Delta^2) \\ &\le \exp(C p \log (LL_\phi p/\epsilon) - \epsilon^2m/2\Delta^2 ) .\end{align*} Now simply pick $m$ large enough (top of page 21) to upper bound the stuff in the $\exp$ above by $-p.$ Note that there is a typo at the end of the first para on pg 21, which should read $\le \epsilon/4$, not $\ge.$

As for the $\epsilon$-net itself, this refers to nets in the context of metric spaces, not the topological notion (although the two are closely related). See this, and also this for quick definitions, and these notes of Wu for more detail.

For the purposes of this Q, we only need the covering property - consider a metric space $(\mathcal{M}, \rho)$ and a set $\mathcal{S}\subset \mathcal{M}$. We say that $\mathcal{C}$ is an $\epsilon$-covering of $\mathcal{S}$ if for every point $s \in S$, there exists a $c \in \mathcal{C}$ such that $\rho(s,c) \le \epsilon.$ Coverings are useful because while it is generally difficult to control the behaviour of an uncountable number of points in a probabilitstic context, coverings let you control a finite number via the union bound, and then argue that all the points you care about have similar behaviour because each of them is close to some point of the covering. This is precisely how this concept is used in the paper.

In their context, it is sufficient to find a set $\mathcal{X}$ that forms a $\frac{\epsilon}{4p L L_\phi}$-covering of the space $\mathcal{V}$ with the $\ell_\infty$ metric, at which point with the Lipshitzness of the $D_v$s and $\phi$ you can argue that the expectations they are interested in are $\epsilon$-close. It is a simple volume argument to bound the size of the $\mathcal{X}$ needed (see the notes by Wu for this).