Shalev-Shwartz/Ben David contains a result which implies the following (via Theorem 26.3). Let $\text{ERM}$ denote a learning procedure (something mapping training sets to hypothesis functions) for a hypothesis class $\mathcal H$. Say the target function $f$ is contained in $\mathcal H$, so $\text{ERM}$ always returns a hypothesis with zero empirical risk. Then
$$\mathbb E[L(\text{ERM}(S))] \leq 2\mathbb E(R(S)) $$
where $L$ denotes the true risk, $S$ is a random samples of size $n$, and $R$ is the Rademacher complexity of the sample.
What is the point of this bound? The optimal bound for the left hand side, across all possible ERM algorithms, is obvious: let $\text{ERM}_\text{worst}$ denote the procedure which for any sample $S$, returns the the function in $\mathcal H$ consistent with the sample with maximum loss against the target function $f$. This is the best possible bound on the left hand side, given the sample $S$. So the above Rademacher bound must be looser than that... so what is the motivation for it?
The notion of Rademacher Complexity is "a" measure of the complexity of class. Consider a class $\mathcal{H}$, we are interested in this quantity: $$ \mathbb{E}[\sup_{h\in \mathcal{H}}[|L_D(h)-L_S(h)|] $$ If we can bound this quantity, then for every function the finite sample estimate of $L_S(h)$ using $n$ samples is close to the population loss of $h$ "even if you pick $h$ based on $S$". Rademacher Complexity exactly helps us to bound this quantity.
Regarding your question, we want to show that if an algorithm picks "any" ERM from a class with small Rademacher Complexity then you will have small excess risk. For sure, you can consider an ERM with the largest population risk $h = \text{ERM}_{worst}$, then again we have the same bound for this hypothesis. Please look at Thm. 26.5 for more on this.