Set up: Let's say I have a bag with 12 notes containing strings of text made up of the letters A-L.
Action: Someone pulls one note out from that hat, then at random they choose 2 letters written on their note, and tell those letters to me.
Goal: I'm trying to calculate the average amount of information I get by knowing any 2 letters off their note.
Case study:
To make it super flushed out, let's assume these are the notes in the hat:
- ACEFHJL
- ABDFGIK
- BCEGHJL
- ACDFHIK
- BDEGIJL
- ACEFHJK
- BDFGIKL
- ACEGHJL
- ABDFHIK
- BCEGIJL
- ACDFHJK
- BDEGIKL
Example 1:
If for instance, this person tells me their note contains the letters A and B, I know they are either holding note number 2, or note number 9, since those are the two notes containing both A and B. This therefore, cuts the space of possibilities from 12 down to 2, equivalent to about 2.58 bits of information.
Example 2:
If on the other hand, they tell me their note contains the letters A and F, it would mean that the note they are holding is number 1, 2, 4, 6, 9, or 11, since they all contain both A and F. This in turn only cuts the space in possibilities in half, thus providing only 1 bit of information.
Recap:
So, we see that knowing some 2-letter combinations (like AB) provide more information than others (like AF). I would like calculate the average amount of information I get by knowing any 2 letters? That is to say, if we repeated this exercise a million times, each time, this person would shuffle the notes in the hat and pull out one. Then at random tell me 2 letters off of that note. How much information will I gain on average?
Flushing it all out:
Here's another example. Let's say this time I have a bag with the following notes in it:
- ACEG
- ABDF
- BCEG
- ACDF
- BDEG
- ACEF
- BDFG
Below is a table with the information gained by knowing the selected note contains any 2 letters
| Letter combo | p(E) | I(E) | Appears in: |
|---|---|---|---|
| AB | 0.14 (1/7) | 2.84 | 2 |
| AC | 0.43 (3/7) | 1.22 | 1, 4, 6 |
| AD | 0.29 (2/7) | 1.79 | 2, 4 |
| AE | 0.29 (2/7) | 1.79 | 1, 6 |
| AF | 0.43 (3/7) | 1.22 | 2, 4, 6 |
| AG | 0.14 (1/7) | 2.84 | 1 |
| BC | 0.14 (1/7) | 2.84 | 3 |
| BD | 0.43 (3/7) | 1.22 | 2, 5, 7 |
| BE | 0.29 (2/7) | 1.79 | 3, 5 |
| BF | 0.29 (2/7) | 1.79 | 2, 7 |
| BG | 0.43 (3/7) | 1.22 | 3, 5, 7 |
| CD | 0.14 (1/7) | 2.84 | 4, |
| CE | 0.43 (3/7) | 1.22 | 1, 3, 6 |
| CF | 0.29 (2/7) | 1.79 | 4, 6 |
| CG | 0.29 (2/7) | 1.79 | 1, 3 |
| DE | 0.14 (1/7) | 2.84 | 5 |
| DF | 0.43 (3/7) | 1.22 | 2, 4, 7 |
| DG | 0.29 (2/7) | 1.79 | 5, 7 |
| EF | 0.14 (1/7) | 2.84 | 6 |
| EG | 0.43 (3/7) | 1.22 | 1, 3, 5 |
| FG | 0.14 (1/7) | 2.84 | 7 |
How do I compute the average information gained by selecting a random row from this table?