How to define a utility of an information source?

93 Views Asked by At

This is a more specific (and, hopefully, clearer version of a previous question).

The utility of discovering the value of a random variable $X$ can be defined to be its information content. When that value is discovered through an intermediate source $S$, one would want to apportion the utility among the possible values of the source, like this: \begin{align*} H(X)&=-\sum_i P(X=x_i)\log P(X=x_i)\\ &=-\sum_i (\sum_j P(X=x_i,S=s_j))\log P(X=x_i)\\ &=\sum_j -\sum_i P(X=x_i,S=s_j)\log P(X=x_i)\\ &=\sum_j P(S=s_j) (- \sum_i P(X=x_i|S=s_j)\log P(X=x_i)) \end{align*}

which leads to the definition of utility of source $s_j$ for discovering $X$:

$$U(s_j,X)=-\sum_i P(X=x_i,S=s_j)\log P(X=x_i)$$

The nice properties of this definition include independence on $j$ when $X$ and $S$ are independent and the "intuitive monotonicity": for the example in the previous question, the utility of Tulip is 3/4, Iris is 1/2 and Daisy is 1/4 (the order is as expected).

Another thing which we can defined is the source quality(?):

$$Q(s_j,X)=-\sum_i P(X=x_i|S=s_j)\log P(X=x_i)=\frac{U(s_j,X)}{P(S=s_j)}$$

The question is: do these quantities have official names?