Lower bound on uncertainty reduction

147 Views Asked by At

Let $T$ be a set of tuples such that each score tuple $s(t_i)$, $t_i \in T$ is uncertain (i.e., not known deterministically). The score $s(t_i)$ can be represented as a uniform probability density function.

Suppose I want to order the tuples in the set $T$ according to their score, i.e., the higher is the score, the higher is the ranking. Since the score distributions of a two tuples $t_i$ and $t_j$ may overlap, the order of these tuples may not be defined. Thus, multiple orderings may be possible.

Let $S$ be the space of possible orderings of the tuples in $T$, which is associated with an uncertainty value $U(S)$ (computed according to a predefined metrics, e.g., entropy).

My goal is to reduce the uncertainty $U(S)$ of the space of possible orderings. In order to do that, I have access to a source of information that, if queried, produces new information (e.g., information about the relative ordering of two tuples in the set), which can be used in order to reduce the uncertainty $U(S)$ (deleting some of the possible orderings from the space $S$). $N$ possible queries are available. Each time this source is queried, the uncertainty is modified so that the residual uncertainty in the space is less or equal to the original one.

This source can be queried just $Q$ times ($0 < Q \leq N$). Thus, at the $i$-th step, the residual uncertainty is $U(S_{i-1})$ and the $i$-th query produces an uncertainty reduction comprised in $[0,U(S_{i-1})]$. Specifically:

  • Step 0: U(S), no questions asked
  • Step 1: $0 \leq U(S_1) \leq U(S)$, one queries asked
  • Step 2: $0 \leq U(S_2) \leq U(S_1)$, two queries asked
  • ...
  • Step $Q$: $0 \leq U(S_Q) \leq U(S_{Q-1})$, $Q$ queries asked

Notice that two different queries may bring to different uncertainty reductions, i.e., a query may be more promising than another one.

My question is: is it possible to compute a lower bound on the residual uncertainty after $Q$ queries (that is not $lb = 0$), given the fact that I don't know a-priori the sequence of queries that will be asked to the source of information?