Suppose you have a set of 200 videos, with 100 “Yes”(Like) and 100 “No”(Dislike). You now have two choices of attributes to ask about:
A is a binary attribute. If you ask about attribute A you get two resulting sets, one with 80 Yes and 40 No, and the other with 20 Yes and 60 No.
B is a binary attribute. If you ask about attribute B you get two resulting sets, one with 100 Yes and 75 No, and the other with 0 Yes and 25 No.
Which of these two attributes is the more informative one to ask about?
I believe the answer is A, since if you guessed about the first set in A you have a 66% chance of being correct and if you guess about the second you have a 75% chance of being correct, thus 120(66%) + 80(75%) = 140 correct choices while the same calculation for the second attribute is 125. However, my friend tells me I am misunderstanding the meaning of "informative."
The information content of an outcome is defined to be $-\log_2 (p)$, where $p$ is the probability of the outcome. (Reference: Information Theory, Inference, and Learning Algorithms by David J.C. MacKay).
Option A is equivalent to drawing a sample of 120 videos and finding that the sample contains 80 Yeses. The probability of this event is $$p_A = \frac{\binom{100}{80} \binom{100}{40}}{\binom{200}{120}} \approx 4.473 \times 10^{-9}$$ so the information content is $-\log_2(p_A) \approx 27.74$.
Option B is equivalent to drawing a sample of 175 videos and finding that the sample contains 100 Yeses. The probability of this event is $$p_B = \frac{\binom{100}{100} \binom{100}{75}}{\binom{200}{175}} \approx 5.363 \times 10^{-9}$$ so the information content is $-\log_2(p_B) \approx 27.47$.
Therefore option A has the higher information content.