Total number of comparisons in Binary Insertion Sort Algorithm

977 Views Asked by At

I've built a custom binary insertion sort algorithm that relies on the user to make the comparison as to what is greater or not. I want to add a progress bar, which means I need a way of calculating the worst-case number of comparisons they will need to do.

I wasn't able to find anything online with a formula for calculating this, so I sat down to think about it practically and I think I came up with something that will return the correct response:

Python example

import math

def totalComparisons(n):
    if n < 3:
        return 1
    total = 0
    for i in range(1, n):
        total += math.floor((n-i)/2) + 1
    return total


print(totalComparisons(4))

This will return 5 for n = 4, which I believe is correct using the following practical example:

Practical example

[A, B, C, D]
1. A - B
2. C - B
3. C - A
4. D - B
5. D - A | D - C

Is this correct? Can this be boiled down to the following?

Formula

$$\sum_{i=1}^{n-1} \lfloor{\frac{(n-1)}{2}}\rfloor + 1 $$

Edit

The above started falling apart when n > 6, I believe this formula works:

Formula

$$\sum_{i=1}^{n-1} \lfloor{(i-\lceil{\frac{i}{2}}\rceil)/2}\rfloor+2 $$

Python

def totalComparisons(n):
    if n < 2:
        return 0;
    if n < 3:
        return 1
    total = 1
    for i in range(1, n-1):
        total += math.floor((i - math.ceil(i/2)) / 2) + 2
    return total


for i in range(13):
   print(f'Combinations for {i} items: {totalComparisons(i)}')

Comparisons for list length 0 items: 0
Comparisons for list length 1 items: 0
Comparisons for list length 2 items: 1
Comparisons for list length 3 items: 3
Comparisons for list length 4 items: 5
Comparisons for list length 5 items: 7
Comparisons for list length 6 items: 10
Comparisons for list length 7 items: 13
Comparisons for list length 8 items: 16
Comparisons for list length 9 items: 19
Comparisons for list length 10 items: 23
Comparisons for list length 11 items: 27
Comparisons for list length 12 items: 31

Edit 2

Fixed Python to get the correct results as from joriki's answer

def totalComparisons(n):
    if n < 2:
        return 0;
    if n < 3:
        return 1
    total = 1
    for i in range(2, n):
        total += math.floor((i - math.ceil(i/2)) / 2) + 2
    return total

Comparisons for list length 0 items: 0
Comparisons for list length 1 items: 0
Comparisons for list length 2 items: 1
Comparisons for list length 3 items: 3
Comparisons for list length 4 items: 5
Comparisons for list length 5 items: 8
Comparisons for list length 6 items: 11
Comparisons for list length 7 items: 14
Comparisons for list length 8 items: 17
Comparisons for list length 9 items: 21
Comparisons for list length 10 items: 25
Comparisons for list length 11 items: 29
Comparisons for list length 12 items: 33
1

There are 1 best solutions below

2
On BEST ANSWER

Since you refused our requests to make the question self-contained by describing your insertion algorithm, I’ll first describe it here as gleaned from the code you linked to. Next time you ask a question here, please do this yourself.

In step $k$, insert the $(k+1)$-th item into the array of $k$ items already sorted. To find the appropriate place, perform a binary search of those $k$ items by comparing with the midpoint (rounded down, but that shouldn’t matter) and recursively applying the binary search to the upper or lower half as long as necessary.

In the worst case, each binary search requires $\left\lfloor\log_2k\right\rfloor+1$ comparisons. Thus the total number of comparisons in the worst case is

\begin{eqnarray} a_n &=& \sum_{k=1}^{n-1}\left\lfloor\log_2k\right\rfloor+1 \\ &=& \sum_{l=1}^{\left\lfloor\log_2n\right\rfloor}l2^{l-1}+(n-2^{\left\lfloor\log_2n\right\rfloor})(\left\lfloor\log_2n\right\rfloor+1) \\ &=& (\left\lfloor\log_2n\right\rfloor-1)2^{\left\lfloor\log_2n\right\rfloor}+1+(n-2^{\left\lfloor\log_2n\right\rfloor})(\left\lfloor\log_2n\right\rfloor+1) \\ &=& n(\left\lfloor\log_2n\right\rfloor+1)-2^{\left\lfloor\log_2n\right\rfloor+1}+1 \;. \\ \end{eqnarray}

Here’s a table of these values:

\begin{array}{c|cc} n&1&2&3&4&5&6&7&8&9&10&11&12\\\hline a_n&0&1&3&5&8&11&14&17&21&25&29&33 \end{array}

The first deviation from your counts is at $a_5=8$. Here the four binary searches require $1$, $2$, $2$, $3$ comparisons in the worst case, so it seems $8$ is the correct answer. The only way to know for sure is to count the comparisons in the execution of the algorithm itself.