There are N items, say three: call them A B and C. For each item, there is an associated bit (0 or 1) and there is a prior probability that the bit is 1, call them p(A), p(B) and p(C).
There is some arbitrary structure of the information, e.g.:
- Example 1: if A = 0, then B = 0 and C = 0; if A = 1, then posterior p(B) = 0.5 and p(C) = 0.5.
- Example 2: if A = 0, then B = 0; if B = 0, then C = 0. If A = 1 then posterior p(B) = 0.2 and p(C) = 0.7.
Each item can be "read", i.e. the value can be learned, by paying a price of 1. After reading an item, one needs to do a Bayesian update of the probabilities for the connected items.
You are required to know the value of the bit associated with each and every item. Which item should be read first to minimize the expected cost?
First best: an answer with a nice formalism that makes it easy to tackle this kind of problems (e.g., would it be useful to represent the information structure as a matrix, and how? Is this a graph, and if so, do I care? Is there a concept of "information" such that I can equivalently ask "what node contains the most information"?).
Second best: references or at least a few good keywords so I can search my own readings on the topic. I bet volumes have been written... but I don't even know where to start. I am not a mathematician and I don't know the right keywords to search.
It would also be nice to consider possible generalizations, e.g.
- only a subset of items are required
- each item has a different cost (instead of 1)
Thank you in advance.
I would go with conditional entropy: $H(A,B,C|readings)$, where $readings$ is the result of the measurements. Clearly, if you "read" all three instances, then no uncertainty is left, and indeed
$$ H(A,B,C|A,B,C)=0. $$
Moreover, you have $H(A,B,C|A)=H(B,C|A)$. If you want to know which reading leaves you with the least uncertainty about the rest, then you should choose the letter that minimizes the above equation. In other words, if
$$ H(A,B,C|C) < H(A,B,C|A) < H(A,B,C|B)$$
then you should start with $C$. Your next reading should be $B$, if
$$ H(A,B,C|C,B) < H(A,B,C|C,A).$$
The same answer gives entropy: $H(reading)$ is the information a given reading contained. In above example, the chain rule of entropy shows that
$$ H(C)>H(A)>H(B)$$
i.e., you should again start with $C$. Your next reading should be $B$, if
$$ H(B,C)>H(A,C). $$
For an easy start into information theory I would recommend Cover & Thomas' book "Elements of Information Theory".