VC dimension of union and intersection

3.1k Views Asked by At

Let $\mathbb{A}$ and $\mathbb{B}$ be sets of subsets of $X$ of finite VC dimension. Show that:

(a) If $\mathbb{C}=\{A\cap B : A \in \mathbb{A},B \in \mathbb{B}\}$, then $\prod_{\mathbb{C}}(n)\leq\prod_{\mathbb{A}}(n) \cdot \prod_{\mathbb{B}}(n) $

(b) If $\mathbb{C}= \mathbb{A}\cup \mathbb{B}$, then $\prod_{\mathbb{C}}(n)\leq\prod_{\mathbb{A}}(n) + \prod_{\mathbb{B}}(n) $

The definition is of Vapnik-Chernovenkis (VC) dimension of a set of classifiers $\mathbb{H}$ is $$\text{VC}(\mathbb{H})=\text{max}\{n\in\mathbb{N}:\prod_{\mathbb{H}}(n)=2^n\}$$

The growth function $\prod_{\mathbb{H}}$ is defined as:

$$\prod_{\mathbb{H}}(n)=\text{max}_{\mathbb{x}\in X^n}|\{h(\mathbb{x}):h\in \mathbb{H}\}|.$$

Attempt for (a):

Fix a set $X$ of $n$ points. Let $Y_1,...,Y_k$ be the set of intersections of $\mathbb{A}$ with $X$. By defintion of $\prod_{\mathbb{A}}(X),k\leq\prod_{\mathbb{A}}(X)\leq \prod_{\mathbb{A}}(n)$. By definition $\prod_{\mathbb{B}}(Y_i)$, the intersection of the $\mathbb{B}$ with $Y_i$ is at most $\prod_{\mathbb{B}}(Y_i)\leq \prod_{\mathbb{B}}(n)$. Thus, the number of sets intersections of $\mathbb{C}$ with $X$ is at most

$$k \prod_{\mathbb{B}}(Y_i)\leq \prod_{\mathbb{A}}(n) \prod_{\mathbb{B}}(n)$$

I'm not sure how to go further for (a)

2

There are 2 best solutions below

1
On BEST ANSWER

$\newcommand{\m}[1]{\mathbb{#1}}$ $\newcommand{\r}[1]{\mathrm{#1}}$Yes, I think the answer sort of follows from definition, as I mentioned. But because this is bountied I'd like to put just the little more effort into this one.


What is this VC dimension of a set of "classifiers"? Well, basically a classifier on a set is like a rule : if an element is part of the classifying set, then it satisfies that rule, otherwise it does not. Now, a set of classifiers shatter a set, if every subset of that set can be shown as exactly those following a certain set of rules. Basically, for this set, if you wanted to isolate any fixed subset , you could do it using the given classifiers.

For example, suppose you had $\{1,2,3,4,5\}$ and you want to get the subset $\{2,3,5\}$ by applying some rules. If one of the rules was "prime numbers only" then you can apply this rule to the set and get $\{2,3,5\}$. But suppose there were only the rules : "only even numbers" and "only odd numbers" then you can't actually get $\{2,3,5\}$ from these classifiers alone , and would need other rules to help.

The "strength" of a classification is obviously based on how many elements it can show apart from one another. Hence, the largest possible set that can be shattered by a set of classifiers, is its VC dimension. The exercise that you have been assigned, tells you that if you take the intersection of two sets of classifiers, and similarly if you take their union, these sets of classifiers have limited strength (a statement which can be translated into VC dimension bounds).


Rigorously, in your language now, starting on part $b$. We want to show that $\Pi_{\m{C}}(n) \leq \Pi_{\m{A}}(n) + \Pi_{\m{B}}(n)$. So let $\r{x} \in X^n$, and consider the set $|\{h(\r{x}) : h \in \m{C}\}|$. We know that $\m{C} = \m{A} \cup \m{B}$, so in particular we have $$\{h(\r{x}) : h \in \m{C}\} = \{h(\r{x}) : h \in \m{A}\} \cup \{h(\r{x}) : h \in \m{B}\} \\ \implies |\{h(\r{x}) : h \in \m{C}\}| \leq |\{h(\r{x}) : h \in \m{A}\}| +|\{h(\r{x}) : h \in \m{B}\}|$$

and now we can take the maximum over $\r{x} \in X^n$ on both sides, and the maximum is sub-additive over sets so it splits on the left hand side, leading to : $$ \max_{\r{x} \in X^n} |\{h(\r{x}) : h \in \m{C}\}| \leq \max_{\r{x} \in X^n}\left\{|\{h(\r{x}) : h \in \m{A}\}| +|\{h(\r{x}) : h \in \m{B}\}|\right\} \\ \leq \max_{\r{x} \in X^n}\left\{|\{h(\r{x}) : h \in \m{A}\}|\right\} +\max_{\r{x} \in X^n}\left\{|\{h(\r{x}) : h \in \m{B}\}|\right\} $$

which gives the result.


Your logic for $a$ is actually quite good, but just has a few writing errors.

Suppose $\m{C} = \{A \cap B : A \in \m{A} , B \in \m{B}\}$, and let $\r{x} \in X^n$. Consider the set $\{h(\r{x}) : h \in \m{C}\}$, we would like to describe this set. To do that, note that $h = p \cap q$ for some $p \in \m{A}$ and $q \in \m{B}$, so in particular we have $\{h(x) : h \in \m{C}\} = \{q(p(\r{x})) : p \in \m{A} , q \in \m{B}\}$. Why is this true? Well, $h(\r{x})$ is applying rule $h$ on $\r{x}$, but rule $h$ is basically two rules put together, one of $\m{A}$ and one of $\m{B}$, so applying $h$ is like applying both these rules (the order of rule application does not matter, because at the heart of it , rule application is intersection of sets which is commutative and associative. So I could have written $p(q(\r{x}))$ above as well).

From here we get : $$ \{h(x) : h \in \m{C}\} = \{q(p(\r{x})) : p \in \m{A} , q \in \m{B}\} = \cup_{p \in \m{A}}\{q(p(\r{x})) : q \in \m{B}\} $$

and now taking the sizes and using the union bound along with the definitions : $$ |\{h(x) : h \in \m{C}\}| \leq \sum_{p \in \m{A}} |\{q(p(\r{x})) : q \in \m{B}\}| = \sum_{a \in T} |\{q(a) : q \in \m{B}\}| $$

where $T = \{p(\r{x}) : p \in \m{A}\}$. But then $|T| \leq \Pi_{\m{A}}$ and each summand above is $\leq \Pi_{\m{B}}$, so for each $\r{x}$ the LHS is less than or equal to $\Pi_{\m{A}}\Pi_{\m{B}}$ and the result follows by taking a maximum over $\r{x}$ on the LHS.


To give both proofs in words :

$b$ : If you take the union of two sets of classifiers, then any set classified by the union must already have been previously classified by at least one of the two original sets, therefore the union bound may be used.

$a$ : If you form a "refined" classifying set by combining classifiers from both kinds one after the other and taking all such pairs, then any set which is classified by this "refinement" must go through a classifier of one kind, followed by a classifier of the other kind. These may be treated independently : the classifier of the first kind can only produce so many different sets regardless of input, and the same goes for the second. Finally, we can just multiply these together, as we do for independent events in combinatorics.

2
On

As to (a): your answer is correct.

As to (b): cuts/dichotomies in $\Pi_{\mathbb{C}}(n)$ can be generated from $\mathbb{A}$ or from $\mathbb{B}$. Therefore (b) follows straightforwardly.