Tightness of bound on true risk in the simplest optimistic case

66 Views Asked by At

Vapnik (Statistical Learning Theory) describes the "simplest optimistic" case of learning with empirical risk minimization as the case where at least one of the functions we are selecting from has error probability of $0$. More formally:

Consider the problem of minimizing the risk functional: $$R(Q_i) = \int Q_i(z)dF(z)$$ given an empirical sample $z_1, z_2 ...z_l$ over the finite set of indicator functions $\{Q_i(\cdot)\}_{i=1}^N$. In other words, we seek to minimize the empirical risk:

$$\hat{R}(Q_i) = \frac{1}{l}\sum_j^l Q_i(z_j)$$

Given that at least one of the functions $Q_i$ has true risk $R(Q_i) = 0$ then we have the following bound for functions with $0$ empirical risk:

$$\mathbb{P}\left(\sup_{1\leq k \leq N}\left | R(Q_k) - \hat{R}(Q_k) \right | I(\hat{R}(Q_k)=0) > \epsilon\right) \leq (N-1)(1-\epsilon)^l$$

by applying a union bound and since a function $Q_k$ with risk $R(Q_k) \geq \epsilon$ has an empirical risk of $0$ with probability at most $(1-\epsilon)^l$.

Vapnik claims that this bound is tight and that equality is achieved in the case where our set of $N$ functions contains a single function with a risk of $0$ and the remaining $N-1$ functions form statistically independent events $A_k = \{z | Q_k(z) > 0\}$ and have the same value of risk $R(Q_k) = \epsilon$ for all $k$.

What I've Tried

The probability in question is equal to the probability that any of the $N-1$ remaining functions have empirical risk greater than $\epsilon$. Instead of a union bound we can consider the complement of the event: none of the $N-1$ remaining functions have an empirical risk greater than $\epsilon$. The empirical risk of any one of these $N-1$ remaining functions is equal to $0$ with probability $(1-\epsilon)^l$. Since each of these events are independent, we can take the product: $$\mathbb{P}\left(\sup_{1\leq k \leq N}\left | R(Q_k) - \hat{R}(Q_k) \right | I(\hat{R}(Q_k)=0\right) = (1-\epsilon)^{l(N-1)}$$

but this does not show that the bound shown above is tight.

I must have made a mistake somewhere. I could use feedback on my explanation as well as my reasoning.

1

There are 1 best solutions below

0
On BEST ANSWER

For the context described, the inequalities in Vapnik's derivation become equalities (the union bound is for independent events, and the $N-1$ functions all have the same error probability of $\epsilon$).

Your mistake is in treating the probability that the excess risk is smaller than $\epsilon$ the same as the probability that the risk is $0$.