Which number minimizes this sum?

529 Views Asked by At

Suppose we have a set of $n$ positive integers, $s_1 \leq s_2 \leq s_3 \leq \cdots \leq s_n$. Given two positive integers $a, b$, which number $x$ minimizes the following sum?

$$\sum_{s_i < x} a(x - s_i) + \sum_{s_i > x} b(s_i -x)$$

I've only figured out that if $a=b$, then the answer is the median of the set. However, I can't figure out what happens if $a \neq b$.

3

There are 3 best solutions below

0
On BEST ANSWER

Consider any interval $(s_k, s_{k+1})$ where $s_k < s_{k+1}$. For any $x$ in this interval, there are $k$ elements less than $x$, and $n-k$ elements greater than $x$. So long as we remain in this interval, increasing $x$ by some small amount $\Delta x$ increases the left-hand sum by $ak\Delta x$, but decreases the right-hand sum by $b(n-k)\Delta x$.

Keeping that principle in mind: If there exists an $k$ such that

$$ ak = b(n-k) $$

then the minimal values of $x$ are any of those within the interval $[s_k, s_{k+1}]$ (which may contain only one value, if $s_k = s_{k+1}$).

Otherwise, we find the unique positive $k$ such that

$$ a(k-1) < b(n-k+1) $$

but

$$ ak > b(n-k) $$

Such an $k$ is guaranteed to exist for positive integers $a$ and $b$ (why?), and then the minimum is reached at $x = s_k$.

3
On

Let's call the sum f(x).

If $x<s_1$ then all $s_i$ are greater than x, so f is the sum of n terms each of which has slope -b. So the slope of f in this region is -nb.

As x increases, it becomes greater than $s_1$. In the region $(s_1, s_2)$ f is the sum of 1 term with slope a and (n-1) terms with slope -b. So f has slope a - (n-1)b = a + b - nb.

And in general, if x lies between $s_i$ and $s_{i+1}$ then at this point f has slope ka - (n-k)b = k(a+b) - nb. Eventually x is greater than $s_n$, and the slope of f becomes na.

To minimise f you just need to find the value of k at which the slope of f changes from negative to positive (remember that a and b are both positive values).

3
On

$$\sum_{s_i < x} a(x - s_i) + \sum_{s_i > x} b(s_i -x) = x ( a \sum_{s_i < x} 1 - b \sum_{s_i > x} 1) - a \sum_{s_i < x} s_i + b \sum_{s_i > x} s_i $$ $$ = - b n x + x i ( a +b) - (a+b) \sum_{s_i < x} s_i + b \sum s_i $$ where $i$ is the number where $s_i < x < s_{i+1}$. So $x$ must minimize

$$ F(i) = (i - \frac{b n}{a+b}) x -\sum_{k = 1}^i s_k $$

We have the slope $G(i) = F(i) - F(i-1)$ with

$$ G(i) = (i - \frac{b n}{a+b}) x(i) - (i-1 - \frac{b n}{a+b}) x(i-1) ) - s_i $$

Let $x(i)$ always be in the center of two sum terms, then $x(i)= 1/2(s_{i+1} + s_i)$ and $x(i-1)= 1/2(s_{i} + s_{i-1})$. So

$$ 2 G(i) = (i - \frac{b n}{a+b}) (s_{i+1} + s_i) - (i-1 - \frac{b n}{a+b}) (s_{i} + s_{i-1}) ) - 2 s_i $$

We can now calculate $i$ at the point where $G(i)$ changes sign, so approximately setting $G(i) =0$ we get

$$ 0 = (i - \frac{b n}{a+b}) (s_{i+1} - s_{i-1}) - s_i + s_{i-1} $$

which is approximately $$ i \simeq \frac{b n}{a+b} + \frac{ s_i - s_{i-1} }{s_{i+1} - s_{i-1}} \\ = \frac{b n}{a+b} + \frac{ 1}{1 + \frac{s_{i+1} - s_{i}}{s_i - s_{i-1} }} $$ and again $x(i)= 1/2(s_{i+1} + s_i)$. Note that the last fraction is of the form $\epsilon = 1/(1+x)$ with $x>0$ so it can attain only values between $0$ and $1$. Therefore we have $$ i \simeq \frac{b n}{a+b} + \epsilon $$ with $\epsilon \in (0,1)$. So we obtain for $x$ the $b/(a+b)$th quantile of the range $\{s_1 \cdots s_n\}$, if quantiles are understood in a continuous sense from $0$ to $1$. See the comment by Rahul above who stated that already, and the following discussion.

As a first special case, for equidistant $s_i - s_{i-1} = d$, we have

$$ i \simeq \frac{b n}{a+b} + \frac{d}{2d} = \frac{b n}{a+b} + \frac12 $$ as expected.

As a second special case, for $a=b$ we have

$$ i \simeq \frac{n}{2} + \epsilon $$

so this is the median, as already found by the OP.