Error in Data Partitions with refinement property

29 Views Asked by At

I'm considering these data in my set:

A B C D
1 a \$ F
1 A - T
2 A \$ D
2 A \$ F
2 b - L
3 b \$ O
3 C - F
3 C # R

for each attribute A,B,C, and D it is possible to define the following data partitions:

$\operatorname{Part}(A) = \{\{1, 2\}, \{3, 4, 5\}, \{6, 7, 8\}\}$

$\operatorname{Part}(B) = \{\{1\}, \{2, 3, 4\}, \{5, 6\}, \{7,8\}\}$

$\operatorname{Part}(C) = \{\{1, 3, 4, 6\}, \{2, 5, 7\}, \{8\}\}$

$\operatorname{Part}(D) = \{\{1, 4, 7\}, \{2\}, \{3\}, \{5\}, \{6\}, \{8\}\}$

$\operatorname{Part}(AB) = \{\{3, 4\}, \{1\}, \{2\}, \{5\}, \{6\}, \{7, 8\}\}$

$\operatorname{Part}(ABC) = \{\{3, 4\}, \{1\}, \{2\}, \{5\}, \{6\}, \{7\}, \{8\}\}$

$\operatorname{Part}(ABCD) = \{\{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\}, \{7\}, \{8\}\}$

etc.

I've this formula that calculate the error in a partition:

$err(Part(X)) = ||Part(X)|| - |Part(X)|$

where:

  • $||Part(X)||$ is the total number of element in a set (in the example is 8)
  • $|Part(X)|$ is the number of subset in each partition (for example in part(A) is 3 and in part(B) is 4)

and in example the error values are:

$\operatorname{err(Part(A))} = 8 - 3 = 5$

$\operatorname{err(Part(B))} = 8 - 4 = 4$

$\operatorname{err(Part(C))} = 8 - 3 = 5$

$\operatorname{err(Part(D))} = 8 - 6 = 2$

$\operatorname{err(Part(AB))} = 8 - 6 = 2$

$\operatorname{err(Part(ABC))} = 8 - 7 = 1$

$\operatorname{err(Part(ABCD))} = 8 - 8 = 0$

The main problem is that when considering the partition of multiple attributes, e.g., AB, ABC, or ABCD, for calculating $Part(X)$ it is necessary to calculate the intersection of all the partitions for the attribute involved in X.

Is there a way to define the value of $f(x)$ in:

$\operatorname{err(Part(AB))} = 8 - f(x) $

only knowing the total number of elements (i.e. 8) and the error $\operatorname{err(Part(A))}$ and $\operatorname{err(Part(B))}$?

In other words, is there a way to find a correlation between $\operatorname{err(Part(A))}$, $\operatorname{err(Part(B))}$ and $\operatorname{err(Part(AB))}$?

1

There are 1 best solutions below

3
On

It is not possible to calculate $\mathrm{err}(\mathrm{Part}(AB))$ from $\mathrm{err}(\mathrm{Part}(A))$ and $\mathrm{err}(\mathrm{Part}(B))$ alone.

For example, take the date set $X_1$:

A B
1 a
1 A
2 A
2 A

and the data set $X_2$:

A B
1 a
1 a
2 A
2 A

You can see that, in both cases, $\mathrm{err}(\mathrm{Part}(A)) = \mathrm{err}(\mathrm{Part}(B)) = 2$, however:

  • For data set $X_1$, $\mathrm{err}(\mathrm{Part}(AB)) = 1$
  • For data set $X_2$, $\mathrm{err}(\mathrm{Part}(AB)) = 2$.