If the statements
All crows are black
and
All non black things are non crows
are equal, then why is the former so much easier to communicate by giving examples? What implications does this have on information theory?
If the statements
All crows are black
and
All non black things are non crows
are equal, then why is the former so much easier to communicate by giving examples? What implications does this have on information theory?
On
This is an interesting question. Here's a stab: Let the statement be $A$. I'm considering two ways of convincing you of $A$.
Now, there are far more non-black things than crow things. So if I give you an example where I point out a crow and demonstrate that it is black, I've made further progress towards convincing you than I would have if I pointed out a non-black thing and demonstrated that it was not a crow.
So I'm likely to choose strategy 1).
Now I'm trying to think of an example where strategy 2) is the better bet, i.e. where there are far more "crows" than there are "non-black" things.
EDIT: Mark Dominus points out that this is a thing; it's called Hempel's Paradox. They use ravens, but that's probably really important.
Let $\mathcal{U}$ be the finite universal set of all things under the Sun. Let $\mathcal{B}$ be the set of all black things. Let $\mathcal{C}$ be the set of all crows. Since there are non-black things and black things that are not crows, we have
$$\mathcal{C} \subset \mathcal{B} \subset \mathcal{U}$$
Suppose that a friend living far away is thinking about a thing. As this friend has many interests and thinks about many things, we are conservative and assume that the PMF of the thing being thought of is uniform over $\mathcal{U}$. Hence, the measure of our uncertainty regarding our friend's thought is $\log_2 |\mathcal{U}|$ bits. If our friend sends us the message
then our uncertainty has been reduced to $\log_2 |\mathcal{C}|$ bits, i.e., our friend's message contained
$$\log_2 |\mathcal{U}| - \log_2 |\mathcal{C}| = \log_2 \left(\frac{|\mathcal{U}|}{|\mathcal{C}|}\right) > 0$$
bits of information. However, if our friend sends us the message
then our uncertainty has been reduced to $\log_2 (|\mathcal{U}|-|\mathcal{B}|)$ bits, i.e., our friend's message contained
$$\log_2 |\mathcal{U}| - \log_2 (|\mathcal{U}|-|\mathcal{B}|) = \log_2 \left(\frac{|\mathcal{U}|}{|\mathcal{U}|-|\mathcal{B}|}\right) > 0$$
bits of information. If there are more non-black things than crows, which is a most reasonable assumption to make, then $|\mathcal{U}|-|\mathcal{B}| > |\mathcal{C}|$ and, thus,
$$\log_2 \left(\frac{|\mathcal{U}|}{|\mathcal{C}|}\right) > \log_2 \left(\frac{|\mathcal{U}|}{|\mathcal{U}|-|\mathcal{B}|}\right)$$
i.e., the former message ("crow") contains more information than the latter one ("non-black").