I'm recently studying the concepts of entropy and I've a fundamental question regarding the conceptual formulation of information content of a random variable $X$, or equivalently, the uncertaintly of a system/probabilistic experiments with $n$ possible outcomes. In either cases, I'm trying to understand why the information content, or, the uncertainty has to be an additive function in the formulation process itself? Of course, $f(p) :=k log p$ satisfies the $f(pq)= f(p) + f(q)$. But my question is before defining the information content or uncertaintly as a log function. More precisely:
(1) Information content (https://en.wikipedia.org/wiki/Entropy_(information_theory)): Consider a probability space $(\Omega, \mathcal{F}, P)$ where the three notations in the parentheses denote respectiveky the sample space, a sigma algebra of $P$-measurable sets ("events"), and $P$ is the probability measure on $\Omega$. Now, we don't quite need a random variable to formulate the information content $I$, i.e. $I$ is just a function defined on the "events", $I : \mathcal{F} \to [0, \infty]$, so that:
(1) $I(\Omega)=0$ (This is intutively clear, as the full event doesn't bring us any information),
(2) $I(A) \leq I(B)$ if $P(A) \geq P(B)$ (intuitively clear, as rarer events are more informative), and
(3) $I(AB)=I(A) + I(B)$, where $A,B$ are independent. Now, this is the part I fail to have an intuition: why is it assumed that $I(AB)=I(A) + I(B)$, where $A,B$ are independent?
Let's try to examine some counter examples here. If we'd have assumed $I(AB)=I(A)I(B)$, where $A,B$ are independent, then it'd not necessaily satisfy (2), as there're events $A,B$ with one of $I(A), I(B)$ is less than $1$, and then $I(A) \leq I(AB) = I(A)I(B) > I(A)$, contradintion. So this doesn't work.
What if, we'd have defined $I(AB) = I(A) + I(B) + \phi(I(A), I(B))$, where $\phi(.,.)$ satisfies: i) $\phi(p, 1) = 0 \forall p, \phi(1, q)=0 \forall q, ii) \phi(p,q) = \phi(q, p) \forall p, q $. What'd be the problem if the information content were defined like this?
I understand the last definition looks unnecessarily complicated, so perhaps counterintuitive, and also, perhaps Shannon was inspired by the idea of entropy in thermodynamics, but still I wonder!