I understand that information is the measure of "surprise" to register a particular outcome so it has something to do with the recipient of the message.
In the case where I told a machine something like "Sam likes cake." how would I be able to quantify the information in that message? I could consider that there is a mechanism insides it that can calculate the probabilities of what I say to be either true or false, but then what? I can define:
Given a probability distribuition $\{p(x_{k})\}_{k≤n}$ then the information associated to measuring a particular outcome is $I=−log_{2}(p(x_{k}))$, that a message is string of binary code that minimizes the length of an sentence in english as much as it can in these $0$s and $1$s, and that what I mean by the machine "knowing" is that its able to map the message to an event in a list it has, each having a certain probability of happening.
Do I just take the log of the inverse of that probability and say that's the amount of information? What do I do? It somehow feels like the amount of information would depend on
- The difference between what it has currently mapped
- The likelihood to recieve that specific message
But I don't know how you would go about matching both of these to come up with the information.