I have a random variable $U$ that is a function of individual binary random variables U_i's as follows: $U = \sum_{i=1}^N A_iU_i + B_i(1 - U_i)$
The $U_i$'s are all independent of one another, but they are not identical. The probabilities $P(U_i = 1)$ and $P(U_i = 0)$ are fixed for all $U_i$'s.
Now, instead of getting data from all N $U_i$'s, I have to select a set of $n (n < N)$ number of $U_i$'s. The objective is to approximate the value of $U$ as closely as possible, but using only $n$ number of $U_i$'s. In other words, I am looking to reduce the absolute deviation between $U$ (that uses N sensors) and $\hat{U}$ (that uses $n$ sensors). Then, which $U_i$'s should be selected in this case? Is there any way of measuring the amount of "information" given by a particular $U_i$?
I have looked at Fisher information and Shannon information, but I am not sure how to actually quantify the "information" given by a single $U_i$. Any suggestions on this?
The problem can be solved using a variance criterion.
Let $U = \sum_{i=1}^N A_iU_i + B_i(1 - U_i)= \sum_{i=1}^N (A_i-B_i)U_i + B_i$ with $U_i$, $N$ independent Bernoulli random variables with known parameter $p_i$. Then $$ V(U) = \sum_{i=1}^N (A_i-B_i)^2 V(U_i) $$
If $U^n$ denote the best approximation of $U$, it can be obtained by selecting among the values $(A_i-B_i)^2 V(U_i),\, i=1,\ldots N$ the $n$ greatest.