From Understanding Machine Learning: From theory to algorithms:
How explicitly is the union bound used in the proof of this theorem to get the result in the red box below?
From Understanding Machine Learning: From theory to algorithms:
How explicitly is the union bound used in the proof of this theorem to get the result in the red box below?
Copyright © 2021 JogjaFile Inc.

Let your events be $A_n = \{ |L_D(h) - L_S(h)| < \epsilon_n$ holds for all $h \in \mathcal{H}_n \}$.
$P(A_n^c) < \delta_n $ as per the previous paragraph, where I am using $c$ to denote complement. Now, we are looking for the event that all $A_n$'s happen, i.e. for the $P(\bigcap_{i=1}^n A_n) = 1 - P(\bigcup_{i=1}^n A_n^c)$. Now, using union bound on that last term we get $P(\bigcup_{i=1}^n A_n^c) \leq \sum_i P(A_i^c) \leq \sum_i \delta_i$, so $P(\bigcap_{i=1}^n A_n)$ is at least $1 - \sum_i \delta_i$.
Adding the elements of the set as per OPs request: the theorem says that the space we are working in is $\Omega = \mathcal{D}^n$, and we pick some $S$ in it. So, more formally, we have $A_n = \{ S \in \mathcal{D}^n: |L_D(h) - L_S(h)| < \epsilon_n$ holds for all $h \in \mathcal{H}_n \}$.