How can we proof (mathematically) that any complexity of sorting algorithm that sorts a random array of integers is no better than $O(n\log n)$?
Is there any sort algorithm quicker than Quicksort given a random array of integers?
108 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
It somewhat depends on what you mean by "random array of integers". As pointed out, any comparison based algorithm needs $\Omega (n \log n)$ comparisons to sort in the worst case. However, if by "random" you mean that your integers are uniform independent random samples between some min value and some max value, then you can scan the array to find the min $m$ and max $M$ in $O(n)$ time, and then define $n$ "buckets" to store a list of integers in, where the "index" for the $k$th bucket is $m + k(M-m)/n$.
Then, for each integer, in $O(1)$ time you can figure out the nearest index (using a few arithmetic operations) and put the integer in that bucket. Then after you have put all your integers in buckets, you sort each bucket anyway you want, e.g. with merge-sort, and then this gives you the overall sorting of your integers. The reason why this can beat $\Omega (n \log n)$ with high probability is that on average you only expect 1 integer per bucket if your integers really are uniform random in some range. So only a very small percentage of buckets should have e.g. more than 5 integers. If you double the number of buckets (keeping the indexes equally spaced, now with half the original spacing) then on average you only expect 0.5 integers per bucket so it's even less likely to get buckets with more than 5 integers.
This is only true for comparison-based sorting algorithms (*); these lectures notes by Avrim Blum give a proof, but any good textbook in computer science should provide one. At the very core, these arguments boil down to saying: "there are $n!$ possible permutations of $n$ elements, hence one needs $\log n! =\Theta(n\log n)$ bits to uniquely identify one. But a comparison only reveals one bit of information, so $\Omega(n\log n)$ comparisons are required in the worst case."
This (sketchy) argument and the (actual) argument do generalize to the average case (see Section 5.3 of the above lecture notes).
(*) See e.g. Radix sort, which is not comparison-based, and has worst-case complexity $O(wn)$ where $w$ is another parameter (which can be greater or smaller than $\log n$).