PAC learning bound: sample dependence through ERM

65 Views Asked by At

I'm trying to understand the dependence of a function $\hat{f}$ selected by empirical risk minimization (ERM) on its finite sample $S$.

Suppose we have a dataset $S = \{s_i\}_{i=1}^m$ with $s_i := (x_i,y_i) \sim P$. Then for any $\hat{f}$ such that $$\mathcal{E}_{true}[\hat{f}] := \mathbb{E}_{(x,y)\sim p} [\mathbb{I}_{\{\hat{f}(x)\not=y\}}] = \pi,$$ we have that $e_{\hat{f}}(s) := \mathbb{I}_{\{\hat{f}(x)\not=y\}}$ is a random variable with $e_{\hat{f}}(s) \sim \text{Bernoulli}(\pi)$.

However, if we choose $\hat{f} \in \mathcal{F}$ according to some empirical risk minimization framework for a finite sample $S$, then we no longer have $e_{\hat{f}}(s) \sim \text{Bernoulli}(\pi)$ because $e_{\hat{f}}(s)$ depends on $S$ through choice of $\hat{f}$.

Can you help me clarify why $e_{\hat{f}}(s) \sim \text{Bernoulli}(\pi)$ doesn't hold in this case? I know that it has to do with using $S$ to both estimate $\hat{f}$ and estimate the error, but the details are eluding me (so I'm probably missing a fundamental probability explanation). It feels weird to me because for any given $f$ with $\mathcal{E}_{true}[f]=\pi$, we do have $e_{f}(s) \sim \text{Bern}(\pi)$.

Further, if we knew how $e_{\hat{f}}(s)$ were distributed in the case where $\hat{f}$ came from ERM, could we come up with better bounds than using the union bound?

1

There are 1 best solutions below

0
On

I think there is an error in the first display in your question.

$E_{x,y}(1(\hat f(x) \neq y)) $ is still a random variable, since $\hat f$ is estimated from data. So it cannot equal a number $\pi$.