I was trying to understand PAC bounds on the realizable case (i.e. when there is some perfect $h^* \in \mathcal{H}$ and its generalization error is zero).
Notation:
Training data: $$S_n$$ Training error on training data $S_n$ $$\mathcal{E_{n}(h)}$$ Generalization error $$\mathcal{E(h)}$$ Hypothesis class $$\mathcal{H}$$ Size of hypothesis class: $$|\mathcal{H}| = K$$ Probability over the choice of training set that the PAC bound fails: $$\delta$$
I would like to derive how $\epsilon$ relates to other PAC variables, n, $K$ and $\delta$.
Attempted derivation:
Let $h_{bad}$ be any classifier that generalizes poorly $\mathcal{E(h)} > \epsilon$ . What is the probability that we would choose it after seeing the training set? If we choose it then given the current $S_n$ then we would choose it, if it has zero training error. Then, what is the probability that this bad classifier makes no training error?
$$P[h_{bad} \text{makes no error on } S_n] = (1 - \mathcal{E(h)})^n < (1-\epsilon)^n$$
this number is an upper bound on the "fraction/precentage of classifiers" that are bad and are chosen with the current training data.
From this I was trying to relate that upper bound to $\delta$ and was trying to figure out if the following statement was true:
$$(1 - \mathcal{E(h)})^n = \delta$$
If its true then one can conclude that:
$$\delta < (1-\epsilon)^n$$
Which is the bound that makes sense to me, but it seems that I am missing one important variable $|\mathcal{H}| = k$. For some reason the correct bound is suppose to be $\delta \leq |\mathcal{H}|(1-\epsilon)^n|$ but I can't figure out why. I think my problem is that I can't seem to understand what the difference of $\delta$ and $1 - \mathcal{E(h)})^n$ is, or the relation between them. I believe I got the correct relation but I am not sure.
$P[h_{bad} \text{makes no error on } S_n]$ is the probability that a particular bad classifier would work on the training data. To get a bound on the probability that any bad classifier would work on the training data, we need to multiply by the number of potentially bad classifiers, $K$.