Given an ordered sequence $S_n = s_1, s_2, ..., s_n$ we define its cost as the sum of absolute deviations from any median: $$ cost(S_n) = \sum_{i=1}^{n} \left| s_i - median(S_n) \right|\text{, where } median(S_n) = s_{\left\lceil \frac{n}{2}\right \rceil} \text{ or } s_{\left\lceil \frac{n+1}{2}\right \rceil}. $$
By adding or removing an arbitrary value from $S_n$, how can the new cost be expressed based on $cost(S_n)$, the previous median and the arbitrary value ?
I've tried to deduce a recurrence relation by using the cost definition from above, and splitting into cases based on the relation between $x$ and the median, however my results are incorrect:
- adding a new value $x$, resulting in the sequence $S_{n+1}$: $$ cost(S_{n+1}) = cost(S_n) + \left| x - median(S_{n+1}) \right| + median(S_n) - median(S_{n+1})$$
- removing an existing value $x$, resulting in the sequence $S_{n-1}$: $$ cost(S_{n-1}) = cost(S_n) - \left| x - median(S_{n}) \right| + median(S_n) - median(S_{n-1})$$
Is there a simpler way to deduce this?
Let $S_n = (s_1,\ldots,s_n)$ with $s_1 \le \ldots \le s_n$,
If $n$ is even, set $n=2p$. Then the cost of $S_n$ equals $\displaystyle \sum_{i=1}^{2p}|s_i-s|$ for any $s \in [s_p,s_{p+1}]$. Adding the new value will give a unique median, namely $s_p$, $x$, or $s_{p+1}$ according that $x \le s_p$, $x \in [s_p,s_{p+1}]$, or $x \ge s_{p+1}$. As a result, the increment of the cost is the distance of $x$ to $[s_p,s_{p+1}]$.
If $n$ is even, set $n=2p-1$. Then the unique median of $S_n$ is $s_p$, and $s_p$ will still be a median once the value $x$ is added. As a result, the increment of the cost is $|x-s_p|$.
If $n=2p$, the increment of the cost is $-|x-s|$, where $s$ is the unique median after removing $x$, namely $s=s_p$ if $x \le s_{p-1}$ and $s=s_{p-1}$ if $x \ge s_p$. In both cases, the increment is minus the maximum of the the distances from $x$ to $s_{p-1}$ and $s_p$.
If $n=2p+1$, then $s_{p+1}$ is still a median after removing $x$, so the increment is $-|x-s_{p+1}|$.