Generalization Bounds

46 Views Asked by At

Given the loss function $L(\hat{y},y)$

the generalization error is defined as $$R(h) = \underset{(x,y)\sim D}{\mathrm{E}}[L(h(x),y)]$$

the empirical error is defined as $$\hat R(h) = \frac{1}{m}\underset{i=1}{\sum^{m}}[L(h(x_i),y_i)]$$

Bayes Error $$R^* = \underset{\substack{h\\h\:measurable}}{\mathrm{inf}} R(h)$$

The difference between the error of a hypothesis $h \in H$ and the Bayes error can be decomposed as: $$R(h) - R^* = R(h) - R(h^*) + R(h^*) - R^*$$ where $h^*$ is a hypothesis in H with minimal error, or a $best-in-class$ $hypothesis$

finding hypothesis $h \in H$ minimizing empirical error $$h = \underset{h \in H}{argmin} \hat R(h) $$

Bound on estimation error for hypothesis $h_0$ is given by \begin{align*} R(h_0) - R(h^*) &= R(h_0) - \hat R(h_0) + \hat R(h_0) - R(h^*)\\ &\leq R(h_0) - \hat R(h_0) + \hat R(h_0) - R(h^*)\\ &\leq \underset {h\in H}{\sup}|R(h) - \hat R(h)| \end{align*}

I am reading about generalization bound from a power point, and couldn't understand how we get the third inequality from the second inequality. The prrof above leads to the statement below, and I couldn't make sense of it either.

$$Pr \begin{bmatrix} \underset{h\in H}{sup} R(h) - \hat R(h) > \epsilon \end{bmatrix}$$