I am seeking to confirm that Bubble Sort has an average performance of $O(n^2)$ I wrote the following code in Python.
bubble_sort takes a list of size $n$ and sorts it, and returns the number of comparisons that were performed.
The main code calls bubble_sort to sort lists of size $i$ from 0 to 25, and generates comparison_count_averages[$i$] as the mean number of comparisons that occurred when sorting 400 randomly-generated lists of length $i$.
It then prints out diffs as the difference in successive elements of comparison_count_averages:
import random
def bubble_sort(list_size):
rand_ints = random.sample(range(0, list_size), list_size)
comparison_count = 0
unsorted = True
while unsorted:
unsorted = False
for i in range(len(rand_ints) - 1):
comparison_count = comparison_count + 1
if rand_ints[i] > rand_ints[i + 1]:
unsorted = True
rand_ints[i + 1], rand_ints[i] = rand_ints[i], rand_ints[i + 1]
return comparison_count
list_size = 0
comparison_count_averages = [] # aligned by index where index = list size..
sort_up_to_list_of_size = 25
number_of_sorts_per_list_length = 400
for i in range(sort_up_to_list_of_size):
comparison_count = 0
for j in range(number_of_sorts_per_list_length):
comparison_count = comparison_count + bubble_sort(list_size)
comparison_count_averages.append(comparison_count / number_of_sorts_per_list_length)
list_size = list_size + 1
diffs = []
for i in range(len(comparison_count_averages) - 2):
diffs.append(int(comparison_count_averages[i + 1] - comparison_count_averages[i]))
print(diffs)
The outputs I am getting are more or less like: [0, 1, 2, 4, 5, 7, 8, 11, 11, 14, 13, 18, 19, 21, 24, 23, 24, 25, 32, 30, 33, 35, 40]
I was not expecting to see values like 23, 24, 25, 32, 30, 33. I am pretty sure I am overlooking something, but what is it?
It looks to me as though
comparison_count_averagesis the thing that should be increasing as $O(n^2)$. In printing outdiffs, I think that you are actually printing out the derivative ofcomparison_count_averages.... and hence this looks like it is increasing linearly!More formally: You have lists of length $x$ and the number of comparisons is bounded above by $f(x) = m x^2$ for some $k>0$. You are plotting out $f(k+1) - f(k)$ at successive integers k. But this is a rough approximation to $f'(x)$ via $$ f'(x) \approx \frac{f(k+1) - f(k)}{(k+1) - (k)} = f(k+1) - f(k) $$ and $f'(x) = 2mx$ which is a linear function.
As further empirical testing
When I initialize
rand_intsto be a strictly increasing list by changing line 4 to:I get
comparison_count_averagesasand
diffsasThese results are consistent with a best-case runtime complexity of $O(n)$.
Likewise when I initialize
rand_intsto be a strictly decreasing list by changing line 4 to:I get
comparison_count_averagesasand
diffsasThese results are consistent with a worst-case runtime complexity of $O(n^2)$.
Finally changing line 4 back to initializing a list of random elements
and then running over larger lists (lines 24-35):
I get
comparison_count_averages(in one run, is not deterministic as the lists are random) asand
diffsasThese results are consistent with an average-case runtime complexity of $O(n^2)$.