Diversity of a Set & Hypercube Vertices related to Statistical Machine Learning

90 Views Asked by At

I am reading a paper on Statistical Machine Learning and am having a hard time visualizing/understanding some terminology (my background is EE).

The part that I am having a hard time with is this blurb (I will explain what I think I understand after quoting the text):

Let $Q(z,\alpha),\alpha \epsilon \Lambda $ be a set of indicator functions, that is the functions which only take on values of zero or one. Consider a sample $z_1 ... z_l$. Let us characterize the diversity of this set of functions on the given sample by the quantity $N^\Lambda(z_1...z_l)$ that represents the number of different separations of this sample that can be obtained using functions from the given set of indicator functions.

Let us write this in another form. Consider the set of $l$ dimensional binary vectors $q(\alpha) = \left( Q(z_1,\alpha), Q(z_2, \alpha) ... Q(z_l, \alpha) \right), \alpha \epsilon \Lambda$ that one obtains when $\alpha$ takes various values from $\Lambda$. Then geometrically speaking, $N^\Lambda(z_1, ... z_l)$ is the number of different vertices of the $l$ dimensional cube that can be obtained on the basis of the sample $z_1 ... z_l$ and the set of functions $Q(z,\alpha), \alpha \epsilon \Lambda$.

I'm trying to understand in English what these two paragraphs mean. My doubts are:

  1. What is meant by "diversity of this set of functions." In my mind, I'm thinking this intuitively means how different the set of functions $Q(z,\alpha), \alpha \epsilon \Lambda$ are for different values of $\alpha$. True? False? Why is the diversity $N^\Lambda$ a function of $z_1...z_l$ and not $\alpha$?
  2. Is the $N^\Lambda$ a mathematical operator? I am unaware of that being one, but how to calculate the diversity of the set (referring to paragraph 1) is not explained in the paper.
  3. I am able to visualize the set of $l$ dimensional binary vectors (one for each different value of $\alpha$). This part is pretty easy because $Q(z,\alpha)$ is defined to be an indicator function, i.e. it can only take on values of 0 or 1. From that though, I am unable to make the leap to "geometrically speaking, $N^\Lambda(z_1, ... z_l)$ is the number of different vertices of the $l$ dimensional cube that can be obtained on the basis of the sample $z_1 ... z_l$ and the set of functions $Q(z,\alpha), \alpha \epsilon \Lambda$."

For those that are interested, the paper I am reading is: An Overview of Statistical Learning Theory, by Vladimir Vapnik (IEEE Transactions on Neural Networks).

Thanks!

1

There are 1 best solutions below

0
On
  1. It is not just "diversity of this set of functions". It is "diversity on the given sample". That is, $\Lambda$ is fixed, and $N$ depends on $z$. At least that's how I translate it back to Russian.

  2. $N^\Lambda$, as far as I know, is introduced in this paper.

  3. If I read this correctly, when you apply a particular function from the $Q$ to all samples, it gives you an $l$-dimensional vector. If each end every function results in the same vector, the diversity of the set is pretty low. More vectors you got, more diversified set is. On a given sample of course.

That said, Vapnik is notoriously hard to follow even in Russian.