(Entropy) Surprise of observing two independent events $A$ and $B$ (or observe $A \cap B$)

154 Views Asked by At

I saw the following explanation:

(Entropy). The surprise of learning that an event with probability $p$ happened is defined as $\log_2(1/p)$, measured in a unit called bits. Low-probability events have high surprise, while an event with probability $1$ has zero surprise. The $\log$ is there so that if we observe two independent events $A$ and $B$, the total surprise is the same as the surprise from observing $A \cap B$. The $\log$ is base $2$ so that if we learn that an event with probability $1/2$ happened, the surprise is $1$, which corresponds to having received $1$ bit of information.

So if we do observe two independent events $A$ and $B$ (or observe $A \cap B$), then what does the $\log$ look like? In other words, what is the mathematics of such a case? We have that $H(X) = \sum_\limits{j = 1}^n p_j \log_2(1/p_j)$ for a single discrete r.v. $X$ whose distinct possible values are $a_1, a_2, . . . , a_n$, with probabilities $p_1,p_2 ...,p_n$ respectively (so $p_1 + p_2 + ··· + p_n = 1$). And if $P(A \cap B) = P(A)P(B)$ for independent random variables $A$ and $B$, then would it be $H(A \cap B) = \sum_\limits{i = 1}^n \sum_\limits{j = 1}^n p_i p_j\log_2(1/p_i) \log_2(1/p_j)$, or something of the sort? Thank you.