What is the point of the Rademacher complexity bound on risk?

Question

What is the point of the Rademacher complexity bound on risk?

123 Views Asked by Bumbble Comm At 11 Apr 2026 - 10:01

Shalev-Shwartz/Ben David contains a result which implies the following (via Theorem 26.3). Let $\text{ERM}$ denote a learning procedure (something mapping training sets to hypothesis functions) for a hypothesis class $\mathcal H$. Say the target function $f$ is contained in $\mathcal H$, so $\text{ERM}$ always returns a hypothesis with zero empirical risk. Then

$$\mathbb E[L(\text{ERM}(S))] \leq 2\mathbb E(R(S)) $$

where $L$ denotes the true risk, $S$ is a random samples of size $n$, and $R$ is the Rademacher complexity of the sample.

What is the point of this bound? The optimal bound for the left hand side, across all possible ERM algorithms, is obvious: let $\text{ERM}_\text{worst}$ denote the procedure which for any sample $S$, returns the the function in $\mathcal H$ consistent with the sample with maximum loss against the target function $f$. This is the best possible bound on the left hand side, given the sample $S$. So the above Rademacher bound must be looser than that... so what is the motivation for it?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-01-29 04:47:36

The notion of Rademacher Complexity is "a" measure of the complexity of class. Consider a class $\mathcal{H}$, we are interested in this quantity: $$ \mathbb{E}[\sup_{h\in \mathcal{H}}[|L_D(h)-L_S(h)|] $$ If we can bound this quantity, then for every function the finite sample estimate of $L_S(h)$ using $n$ samples is close to the population loss of $h$ "even if you pick $h$ based on $S$". Rademacher Complexity exactly helps us to bound this quantity.

Regarding your question, we want to show that if an algorithm picks "any" ERM from a class with small Rademacher Complexity then you will have small excess risk. For sure, you can consider an ERM with the largest population risk $h = \text{ERM}_{worst}$, then again we have the same bound for this hypothesis. Please look at Thm. 26.5 for more on this.

What is the point of the Rademacher complexity bound on risk?

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions