Mean of values ${1 \over m} {\sum_{i=1}^m {s_i \over t_i}}$ vs. Value of sums ${\sum_{i=1}^m {s_i} \over \sum_{i=1}^m {t_i}}$, What is the relation?

96 Views Asked by At

Similar to:

Average of products VS. product of averages

I have

$$ R = \{(s_1, t_1), (s_2, t_2), ..., (s_n, t_n)\} $$

With the property that, for all pairs $(s_i, t_i)$, $0 \le s_i < t_i$.

$$ \forall (s_i, t_i) : 0 \le s_i < t_i $$

The value of one pair, $(s_i, t_i)$ is $s_i \over t_i$

I want to select an $m$ sized subset where $m < n$ that maximizes value.

One way to compute the value of a subset $P \subseteq R$ is to take the mean of the values of the individual pairs:

$$p(P) = {1 \over m} {\sum_{i=1}^m {s_i \over t_i}}$$

Another way is to compute the value of the summed individual elements from the pairs:

$$q(P) = {\sum_{i=1}^m {s_i} \over \sum_{i=1}^m {t_i}}$$

Let $P'$ be the subset $P$ of size $m$ such that $p(P)$ is maximized. (This is easy. Select the $m$ elements of $R$ with highest value.)

Let $Q'$ be the subset $Q$ of size $m$ such that $q(Q)$ is maximized. (I think this requires combinatorial search.)

What is the relationship between $p(P')$ and $q(Q')$?

Will $p(P') \le q(Q')$ always?

Which is the truer maximum value?

1

There are 1 best solutions below

7
On BEST ANSWER

An example with $p(P') > q(Q')$ is this: Let $n=3,m=2$, and define the set: $$ \{(1,1), (1,200), (10^{-6}, 10^6)\} $$ where the pair $(10^{-6}, 10^6)$ was thrown in as an undesirable option to ensure $m<n$. We get: \begin{align} p(P') &= \frac{1}{2}\left[\frac{1}{1} + \frac{1}{200}\right] = 0.5025 \\ q(Q') &= \frac{1 + 1}{1+200} \approx 0.00995 \end{align}

Now, your question about "which is the truer maximum value" does not really make sense without a definition of "truer." The $p()$ and $q()$ functions are just different functions and so they can have different max values.


To find an $\epsilon$-approximation to the optimal set $Q'$ very quickly, you can do the following. (This is similar to what I did for stochastic ratio problems in Section V of this paper: http://ee.usc.edu/stochastic-nets/docs/renewal-optimization.pdf)

Problem setup:

-Fix $n$ and $m$ as a positive integers with $m \leq n$.

-Fix real numbers $\{s_1, ..., s_n\}$.

-Fix positive real numbers $\{t_1, ..., t_n\}$.

Define $\mathcal{A}_m$ as the set of all subsets of $\{1, ..., n\}$ that contain $m$ elements. Define: \begin{align} Q' &= \arg \max_{Q \in \mathcal{A}_m} \left[\frac{\sum_{i\in Q} s_i}{\sum_{i\in Q} t_i} \right] \\ M &= \max_{Q \in \mathcal{A}_m} \left[\frac{\sum_{i\in Q} s_i}{\sum_{i\in Q} t_i} \right] \end{align}

Fix $\epsilon>0$. We want to find a set $Q \in \mathcal{A}_m$ that gives $\frac{\sum_{i\in Q} s_i}{\sum_{i\in Q} t_i} \geq M-\epsilon$.

Solution approach:

Define the following function $g:\mathbb{R}\rightarrow\mathbb{R}$ by: $$ g(\mu) = \max_{Q \in \mathcal{A}_m} \left[\sum_{i \in Q} s_i - \mu \sum_{i\in Q} t_i\right] $$

It is not difficult to show:

1) For a given $\mu \in \mathbb{R}$, the value $g(\mu)$ can be easily found by selecting the $m$ indices in $\{1, ..., n\}$ with the largest values of $(s_i -\mu t_i)$.

2) The function $g(\mu)$ is strictly decreasing in $\mu$.

3) $g(\mu)=0$ if and only if $\mu=M$.

So finding an $\epsilon$-approximation reduces to approximating the zero of the decreasing function $g(\mu)$. We can just run a simple bisection search.