Find recurrence relation for sum of absolute deviations in a sequence

121 Views Asked by At

Given an ordered sequence $S_n = s_1, s_2, ..., s_n$ we define its cost as the sum of absolute deviations from any median: $$ cost(S_n) = \sum_{i=1}^{n} \left| s_i - median(S_n) \right|\text{, where } median(S_n) = s_{\left\lceil \frac{n}{2}\right \rceil} \text{ or } s_{\left\lceil \frac{n+1}{2}\right \rceil}. $$

By adding or removing an arbitrary value from $S_n$, how can the new cost be expressed based on $cost(S_n)$, the previous median and the arbitrary value ?

I've tried to deduce a recurrence relation by using the cost definition from above, and splitting into cases based on the relation between $x$ and the median, however my results are incorrect:

  • adding a new value $x$, resulting in the sequence $S_{n+1}$: $$ cost(S_{n+1}) = cost(S_n) + \left| x - median(S_{n+1}) \right| + median(S_n) - median(S_{n+1})$$
  • removing an existing value $x$, resulting in the sequence $S_{n-1}$: $$ cost(S_{n-1}) = cost(S_n) - \left| x - median(S_{n}) \right| + median(S_n) - median(S_{n-1})$$

Is there a simpler way to deduce this?

1

There are 1 best solutions below

0
On BEST ANSWER

Let $S_n = (s_1,\ldots,s_n)$ with $s_1 \le \ldots \le s_n$,

  1. First, assume that you add a new value $x$.

If $n$ is even, set $n=2p$. Then the cost of $S_n$ equals $\displaystyle \sum_{i=1}^{2p}|s_i-s|$ for any $s \in [s_p,s_{p+1}]$. Adding the new value will give a unique median, namely $s_p$, $x$, or $s_{p+1}$ according that $x \le s_p$, $x \in [s_p,s_{p+1}]$, or $x \ge s_{p+1}$. As a result, the increment of the cost is the distance of $x$ to $[s_p,s_{p+1}]$.

If $n$ is even, set $n=2p-1$. Then the unique median of $S_n$ is $s_p$, and $s_p$ will still be a median once the value $x$ is added. As a result, the increment of the cost is $|x-s_p|$.

  1. Now, assume that you remove a value $x$ from $S_n = (s_1,\ldots,s_n)$.

If $n=2p$, the increment of the cost is $-|x-s|$, where $s$ is the unique median after removing $x$, namely $s=s_p$ if $x \le s_{p-1}$ and $s=s_{p-1}$ if $x \ge s_p$. In both cases, the increment is minus the maximum of the the distances from $x$ to $s_{p-1}$ and $s_p$.

If $n=2p+1$, then $s_{p+1}$ is still a median after removing $x$, so the increment is $-|x-s_{p+1}|$.