PAC Learning Definition with 'if and only if' statement

70 Views Asked by At

In this question, the top answer references a definition of PAC learning from the book

Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David"

The definition in the book says:

A hypothesis class $\mathcal{H}$ is PAC learnable if there exist a function $m_{\mathcal{H}} : (0, 1)^2 \rightarrow \mathbb{N}$ and a learning algorithm $\mathcal{A}$ with the following property: For every $\epsilon, \delta \in (0, 1)$ and for every distribution $\mathcal{D}$ over $\mathcal{X}$, every labelling function $f: \mathcal{X} \rightarrow \{0, 1\}$, when running the learning algorithm on $m \geq m_{\mathcal{H}}(\epsilon, \delta)$ i.i.d. examples generated by $\mathcal{D}$, the algorithm returns a hypothesis $h$ such that, with probability of at least $1 − \delta$, $$L_{D, f}(h) < \epsilon$$

The answer to the question translates this to a mathematical statement that contains an if and only if statement in its definition, as opposed to an implication in a single direction (which was what I had assumed the definition was).

It isn't clear to me exactly that the above definition translates to

if ... then $\left(m \geq m_{\mathcal{H}}(\epsilon, \delta) \iff \mathbb{P}[L_{\mathcal{D}, f}(h) < \epsilon] \geq 1 - \delta \right)$

as opposed to just

if ... then $\left(m \geq m_{\mathcal{H}}(\epsilon, \delta) \implies \mathbb{P}[L_{\mathcal{D}, f}(h) < \epsilon] \geq 1 - \delta \right)$

Why is this? The answer in the original question points to a 'page 23' in the book for an explanation of the iff, but that specific page makes no mention about PAC learning whatsoever.

1

There are 1 best solutions below

2
On BEST ANSWER

Observe that if, for every $\varepsilon$ and $\delta$ there exists such $m_{\mathcal{H}}(\varepsilon, \delta)$, you can always pick the minimum of such $m_{\mathcal{H}}(\varepsilon, \delta)'s$ (valid for any distribution). If you do that, the reverse implication is true and trivial.

As a matter of fact, we normally understand by sample complexity the minimum of such $m_{\mathcal{H}}(\varepsilon, \delta)'s$. This is mentioned in the book.

Conclusion: we can safely assume the double implication.