Why am I unable to observe $O(n^2)$ time complexity with Bubble Sort?

77 Views Asked by At

I am seeking to confirm that Bubble Sort has an average performance of $O(n^2)$ I wrote the following code in Python.

bubble_sort takes a list of size $n$ and sorts it, and returns the number of comparisons that were performed.

The main code calls bubble_sort to sort lists of size $i$ from 0 to 25, and generates comparison_count_averages[$i$] as the mean number of comparisons that occurred when sorting 400 randomly-generated lists of length $i$. It then prints out diffs as the difference in successive elements of comparison_count_averages:

import random

def bubble_sort(list_size):
    rand_ints = random.sample(range(0, list_size), list_size)

    comparison_count = 0
    unsorted = True
    while unsorted:
        unsorted = False
        for i in range(len(rand_ints) - 1):
            comparison_count = comparison_count + 1
            if rand_ints[i] > rand_ints[i + 1]:
                unsorted = True
                rand_ints[i + 1], rand_ints[i] = rand_ints[i], rand_ints[i + 1]

    return comparison_count


list_size = 0
comparison_count_averages = [] # aligned by index where index = list size..

sort_up_to_list_of_size = 25
number_of_sorts_per_list_length = 400

for i in range(sort_up_to_list_of_size):
    comparison_count = 0
    for j in range(number_of_sorts_per_list_length):
        comparison_count = comparison_count + bubble_sort(list_size)
    comparison_count_averages.append(comparison_count / number_of_sorts_per_list_length)
    list_size = list_size + 1

diffs = []
for i in range(len(comparison_count_averages) - 2):
    diffs.append(int(comparison_count_averages[i + 1] - comparison_count_averages[i]))

print(diffs)

The outputs I am getting are more or less like: [0, 1, 2, 4, 5, 7, 8, 11, 11, 14, 13, 18, 19, 21, 24, 23, 24, 25, 32, 30, 33, 35, 40]

I was not expecting to see values like 23, 24, 25, 32, 30, 33. I am pretty sure I am overlooking something, but what is it?

1

There are 1 best solutions below

2
On

It looks to me as though comparison_count_averages is the thing that should be increasing as $O(n^2)$. In printing out diffs, I think that you are actually printing out the derivative of comparison_count_averages .... and hence this looks like it is increasing linearly!

More formally: You have lists of length $x$ and the number of comparisons is bounded above by $f(x) = m x^2$ for some $k>0$. You are plotting out $f(k+1) - f(k)$ at successive integers k. But this is a rough approximation to $f'(x)$ via $$ f'(x) \approx \frac{f(k+1) - f(k)}{(k+1) - (k)} = f(k+1) - f(k) $$ and $f'(x) = 2mx$ which is a linear function.

As further empirical testing

When I initialize rand_ints to be a strictly increasing list by changing line 4 to:

rand_ints = list(range(0, list_size))

I get comparison_count_averages as

[0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0]

and diffs as

[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

These results are consistent with a best-case runtime complexity of $O(n)$.

Likewise when I initialize rand_ints to be a strictly decreasing list by changing line 4 to:

rand_ints = list(range(list_size, 0, -1))

I get comparison_count_averages as

[0.0, 0.0, 2.0, 6.0, 12.0, 20.0, 30.0, 42.0, 56.0, 72.0, 90.0, 110.0, 132.0, 156.0, 182.0, 210.0, 240.0, 272.0, 306.0, 342.0, 380.0, 420.0, 462.0, 506.0, 552.0]

and diffs as

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44]

These results are consistent with a worst-case runtime complexity of $O(n^2)$.

Finally changing line 4 back to initializing a list of random elements

rand_ints = random.sample(range(0, list_size), list_size)

and then running over larger lists (lines 24-35):

# sort_up_to_list_of_size = 25
number_of_sorts_per_list_length = 400
# list_lengths = range(1,25,1) # original
list_lengths = range(5,100,5)

# for i in range(sort_up_to_list_of_size):
for n in list_lengths:
    comparison_count = 0
    for j in range(number_of_sorts_per_list_length):
        comparison_count = comparison_count + bubble_sort(n)
    comparison_count_averages.append(comparison_count / number_of_sorts_per_list_length)
#    list_size = list_size + 1

I get comparison_count_averages (in one run, is not deterministic as the lists are random) as

[14.48, 66.105, 162.89, 304.5225, 485.22, 714.3425, 1003.0, 1315.7625, 1670.46, 2089.115, 2547.45, 3047.94, 3614.56, 4200.375, 4872.715, 5581.9425, 6260.94, 7082.175, 7931.015]

and diffs as

[51, 96, 141, 180, 229, 288, 312, 354, 418, 458, 500, 566, 585, 672, 709, 678, 821]

These results are consistent with an average-case runtime complexity of $O(n^2)$.