VC theory provides an answer to Problem 1 specified below. I am wondering what is known about a similar issue, Problem 2.
$$ ~ $$
Problem 1
Let $X$ be a set, let $\mathcal{D}$ be a distribution over $X$, let $F$ be a set of functions from $X$ to $\{0,1\}$. Let $\varepsilon,\delta \in (0,1)$, and let $x_1,x_2,\dots,x_m$ be a set of i.i.d. samples from $\mathcal{D}$. Find the minimal $m$ such that with probability at least $1-\delta$ over the samples,
$$ \sup_{f \in F}\Big|\mathbb{E}_{x \sim \mathcal{D}}[f(x)] - \frac{1}{m}\sum_{i=1}^m f(x_i)\Big| \leq \varepsilon. $$
It is well known that the answer is $m = \Theta(\mathsf{VC}(F))$, where $\mathsf{VC}(F)$ is the VC dimension of $F$.
$$ ~ $$
Problem 2
Under the same conditions, find the minimal $m$ such that with probability at least $1-\delta$ over the samples,
$$ \Big|\sup_{f \in F}\mathbb{E}_{x \sim \mathcal{D}}[f(x)] - \sup_{f \in F}\frac{1}{m}\sum_{i=1}^m f(x_i)\Big| \leq \varepsilon. $$
Clearly, taking $m = \Theta(\mathsf{VC}(F))$ as before is sufficient to achieve this, but in some cases $m$ can be much smaller. I am wondering if there are any better bounds on $m$ that are known? Can someone refer me to any papers on this issue?
$$ ~ $$
I am aware of some related papers in the context of passive testing in computer science (e.g. here and here), but I am wondering if there is a more direct or comprehensive treatment from a statistics perspective? I know that some closely related topics have been studied extensively in empirical process theory, including an important inequality by Talagrand for the suprema of empirical processes (e.g. here and here), but so far I haven't found a result in empirical process theory that directly addresses my specific question. That isn't my field, so I could very well be missing something basic.
Cross posted from mathoverflow due to lack of response.