I have a set of numbers and I want to maximise the difference between the mean and the median by removing a given number of elements.
For instance I have the following set:
0 5 10 45
Mean = 15
Median = 7.5
Mean - Median = 7.5
If we choose to remove 1 element I want to obtain the following
0 5 45
Mean = 16.66
Median = 5
Mean - Median = 11.66 => MAXIMISED
We can assume that the set is sorted.
Is there an algorithm for doing this?
One hint of how to fast compute is that you can precompute the sum of all $N$ numbers, let us say it is $S$, then the mean value after removing the values $x_1,\cdots, x_k$ would be $$\frac{1}{N-k}\left({S - \sum_{1}^k x_i}\right)$$ if you want to remove only a small part of the elements, then computing the above will be faster than re-computing the whole sum each time for each new subset of $x_i$. So if we have $N = 100$ elements and remove $k = 3$ for each configuration of $x_i$, we only need to do 3 subtractions instead of 97 additions.
For the index of the median, we can do an iterative approach. First calculate the first median. Then we take the index of the first number to remove, if the index of the number to remove is lower than the index of the current median, then the index for the new median increases by $1/2$ ( why? ) and if it is larger it is reduced by $1/2$. If it is the same then no change. Then we need some special handling to check that the final index will not accidentally be the same as one of the removed numbers. If we are into signal processing, we can do this with some kind of adaptive filtering, for instance local normalized averaging where we set the certainty of any removed element to 0 and for any not removed to 1.