The question concerns the proof of theorem 7.2 in Understanding Machine Learning by Shalev-Schwartz & Ben-David. The authors argue in the following way: suppose $\mathcal{H}$ is non-uniformly learnable, that is,
Def: There exists an algorithm $A$ an a function $m_{\mathcal{H}}^{NUL}:(0,1)^2 \times \mathcal{H} \rightarrow \mathbb{N}$ such that for any $\epsilon, \delta\in (0,1)$ and any $h\in\mathcal{H}$, if $m\geq m_{\mathcal{H}}^{NUL}(\epsilon, \delta, h)$, then for any distribution $\mathcal{D}$ over the sample space, if the size of training sample $S$ is equal to $m$, then with probability of at least $1-\delta$ we have $$L_{\mathcal{D}}(A(S)) \leq L_{\mathcal{D}}(h) + \epsilon,$$ where $L_{\mathcal{D}}$ denotes the true risk, i.e. the probability of misclassification.
Now, the theorem concerned says that the class is non-uniform learnable if and only if it is a countable union of hypotheses with a finite VC-dimension. The proof in the book defines $$\mathcal{H}_n = \{h\in \mathcal{H}:m_{\mathcal{H}}^{NUL}(1/8, 1/7, h) \leq n\}.$$ Obviously, $\mathcal{H}$ is a union of $\mathcal{H}_n$. However, it is not clear to me why do $\mathcal{H}_n$ have finite VC-dimension. The proof invokes Fundamental Theorem of Statistical Learning which provides the lower bound for the required training sample guaranteeing $(\epsilon, \delta)$ accuracy-confidence but this theorem assumes the finite VC-dimension, which is precisely what we want to show. Therefore, the argument on the surface appears to be circular. It is also not clear to me why would it break down if I replaced $(1/8, 1/7)$ with $(1/2, 1)$.
Any insight would be much appreciated!
Statement: $VC(\cal{H}_n) \leq 2n$.
Proof: