Let $F$ be a set of functions mapping $\mathbb{R}^n$ to $[0,1]$ with pseudo-dimension $d$ and let $D$ be a distribution over $\mathbb{R}^n \times [0,1]$. We know that for any $\epsilon, \delta \in (0,1)$, with probability $1-\delta$ over the draw of $N = \Theta\left(\frac{1}{\epsilon^2} \left(d + \log \frac{1}{\delta}\right)\right)$ points $\left(\vec{x}_1, y_1\right), \dots, \left(\vec{x}_N, y_N\right) \sim D$, for any function $f \in F$, $$\left|\mathbb{E}_{\left(\vec{x}, y\right) \sim D}\left[\left|f\left(\vec{x}\right) - y\right|\right] - \frac{1}{N} \sum_{i = 1}^N\left|f\left(\vec{x}_i\right) - y_i\right|\right| \leq \epsilon.$$ In particular, if $f$ perfectly matches the data, i.e., $f\left(\vec{x}_i\right) = y_i$ for all $i \in [m]$, then $\mathbb{E}\left[\left|f\left(\vec{x}\right) - y\right|\right] \leq \epsilon$.
I'm wondering if there's a way to get a high probability variation of this statement. For example, given $\epsilon, \alpha, \delta \in (0,1)$, how many samples $\left(\vec{x}_1, y_1\right), \dots, \left(\vec{x}_N, y_N\right) \sim D$ are sufficient to ensure that with probability $1-\delta$ over the draw of these $N$ samples, if $f\left(\vec{x}_i\right) = y_i$ for all $i \in [m]$, then $\mathbb{P}_{\left(\vec{x}, y\right) \sim D}\left[\left|f\left(\vec{x}\right) - y\right| > \epsilon \right] < \alpha$? Ideally, the sample complexity would grow logarithmically in both $\frac{1}{\delta}$ and $\frac{1}{\alpha}$. If this is not possible in general, is it possible for some special function classes, such as linear function classes?