Number of rules (Data Mining)

58 Views Asked by At

This is a paraphrasing of a statement from Data Mining by Witten et al., 4th edition, on section 1.6.

Consider the set $S = \{1, \dots, 288\}$.

A rule is defined as a one-element singleton set whose element is from $S$, e.g., $\{1\}$, $\{2\}$, etc..

The question: how many sets with no more than $14$ rules can be created?

My work: There are $288$ such rules. The result should then be given by $$\sum_{i=1}^{14}\binom{288}{i} \sim 2.36 \times 10^{23}\text{.}$$

See here for the computation.

The correct answer: Witten et al. states that the answer is $\sim 2.7 \times 10^{34}$ with no explanation of the derivation.

Hopefully my terminology is correct. If someone knows this material well and I have something incorrect in my problem setup, please let me know.

Further context which might be helpful: this statement arises from looking at a data set of weather variables. We have five variables per observation, for which each variable has $4$, $4$, $3$, $3$, and $2$ variables. Thus, there are $4 \times 4 \times 3 \times 3 \times 2 = 288$ possibilities for each rule, according to Witten et al. It is possible for a variable to not participate in a rule, but every rule must have at least one variable.

If we restrict the rule set to contain no more than $14$ rules [based on a previous example], there are around $2.7 \times 10^{34}$ possible different rule sets.

The following slides were taken from www.cse.unt.edu/~nielsen/classes/CSCE5380/Slides/Chapter1.pptx:

enter image description here enter image description here enter image description here