Median of medians algorithm

10.4k Views Asked by At

I am referring to the algorithm presented here used to find a good pivot: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

My question is I don't quite understand why the elements have to be divided specifically into groups of 5. Why not some other number?

1

There are 1 best solutions below

5
On

The key section of the Wikipedia article says

The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurse on at most 70% of the list, making the running time $$T(n) \leq T(n/5) + T(7 \cdot n/10) + O(n).$$

The O($n$) is for the partitioning work (we visited each element a constant number of times, in order to form them into $n/5$ groups and take each median in O($1$) time). From this, one can then show that $$T(n) \leq c \cdot n \cdot (1 + (9/10) + (9/10)^2 + \cdots) \in O(n).$$

using the fact that at most 70% of the list is to one side of the median of the medians with groups of five.

If instead you had groups of three the first inequality would be $$T(n) \leq T(n/3) + T(2 \cdot n/3) + O(n)$$ so you would not get a convergent series in in the second inequality.

There is no reason why you should not use something greater than five; for example with seven the first inequality would be $$T(n) \leq T(n/7) + T(5 \cdot n/7) + O(n)$$ which also works, but five is the smallest odd number (useful for medians) which works.