Shannon [ref] introduced entropy as a measure of information, with 3 main properties (page 49). My question is with respect to the third property, where the entropy H should be a weighted sum of the individual values of H, as shown in figure 6, page 49.
How can we measure the structural differences between the two representations? In particular, the right hand decomposition shows some knowledge or a certain way of classifying the choices? Is there a way to quantify such knowledge?
For example, if we take the decomposition in figure, we get the following relations:
$$\underbrace{H(\frac{1}{4}, \frac{1}{4}, \frac{1}{3}, \frac{1}{6})}_{(a)} = \underbrace{H(\frac{1}{4}, \frac{1}{4}, \frac{1}{2}) + \frac{1}{2} H(\frac{2}{3}, \frac{1}{3})}_{(b)} = \underbrace{H(\frac{1}{2}, \frac{1}{2}) + \frac{1}{2} H(\frac{2}{3}, \frac{1}{3}) + \frac{1}{2} H(\frac{1}{2}, \frac{1}{2})}_{(c)}$$
We can see that choices $\{A, B, C, D\}$ in (a) were decomposed according to two schemes (b) and (c). The question again, is how to quantitatively differentiate between the topologies (a), (b) and (c)? and say for example, that (c) is a better classification of the choices $\{A, B, C, D\}$.
Thank you.