I'm considering these data in my set:
| A | B | C | D |
|---|---|---|---|
| 1 | a | \$ | F |
| 1 | A | - | T |
| 2 | A | \$ | D |
| 2 | A | \$ | F |
| 2 | b | - | L |
| 3 | b | \$ | O |
| 3 | C | - | F |
| 3 | C | # | R |
for each attribute A,B,C, and D it is possible to define the following data partitions:
$\operatorname{Part}(A) = \{\{1, 2\}, \{3, 4, 5\}, \{6, 7, 8\}\}$
$\operatorname{Part}(B) = \{\{1\}, \{2, 3, 4\}, \{5, 6\}, \{7,8\}\}$
$\operatorname{Part}(C) = \{\{1, 3, 4, 6\}, \{2, 5, 7\}, \{8\}\}$
$\operatorname{Part}(D) = \{\{1, 4, 7\}, \{2\}, \{3\}, \{5\}, \{6\}, \{8\}\}$
$\operatorname{Part}(AB) = \{\{3, 4\}, \{1\}, \{2\}, \{5\}, \{6\}, \{7, 8\}\}$
$\operatorname{Part}(ABC) = \{\{3, 4\}, \{1\}, \{2\}, \{5\}, \{6\}, \{7\}, \{8\}\}$
$\operatorname{Part}(ABCD) = \{\{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\}, \{7\}, \{8\}\}$
etc.
I've this formula that calculate the error in a partition:
$err(Part(X)) = ||Part(X)|| - |Part(X)|$
where:
- $||Part(X)||$ is the total number of element in a set (in the example is 8)
- $|Part(X)|$ is the number of subset in each partition (for example in part(A) is 3 and in part(B) is 4)
and in example the error values are:
$\operatorname{err(Part(A))} = 8 - 3 = 5$
$\operatorname{err(Part(B))} = 8 - 4 = 4$
$\operatorname{err(Part(C))} = 8 - 3 = 5$
$\operatorname{err(Part(D))} = 8 - 6 = 2$
$\operatorname{err(Part(AB))} = 8 - 6 = 2$
$\operatorname{err(Part(ABC))} = 8 - 7 = 1$
$\operatorname{err(Part(ABCD))} = 8 - 8 = 0$
The main problem is that when considering the partition of multiple attributes, e.g., AB, ABC, or ABCD, for calculating $Part(X)$ it is necessary to calculate the intersection of all the partitions for the attribute involved in X.
Is there a way to define the value of $f(x)$ in:
$\operatorname{err(Part(AB))} = 8 - f(x) $
only knowing the total number of elements (i.e. 8) and the error $\operatorname{err(Part(A))}$ and $\operatorname{err(Part(B))}$?
In other words, is there a way to find a correlation between $\operatorname{err(Part(A))}$, $\operatorname{err(Part(B))}$ and $\operatorname{err(Part(AB))}$?
It is not possible to calculate $\mathrm{err}(\mathrm{Part}(AB))$ from $\mathrm{err}(\mathrm{Part}(A))$ and $\mathrm{err}(\mathrm{Part}(B))$ alone.
For example, take the date set $X_1$:
and the data set $X_2$:
You can see that, in both cases, $\mathrm{err}(\mathrm{Part}(A)) = \mathrm{err}(\mathrm{Part}(B)) = 2$, however: