Why to calculate "Big O" if we can just calculate number of steps?

1.8k Views Asked by At

I never really understood how we use Big O to estimate algorithm efficiency. I know that $f\left(n\right) = O\left(g\left(n\right)\right)$ as $n \to +\infty$ if there exists some constant $C > 0$, such that $\left|f\left(n\right)\right| \leqslant C\left|g\left(n\right)\right|$. But I can't understand what is the purpose of it and why does it makes sence.

Why do we even calculate it? It doesn't tell us neither exact number of steps in our algorithm nor even an approximate number of steps. I mean, suppose that our algorithm does $10000000n^2 + 10^{1000}$ steps for the input of length $n$. We have $O\left(n^2\right)$ in this case, but what does it tell us? You may say that "with asymptotic notation you are trying to describe what will happen when the size $\left(n\right)$ gets really really big", BUT... if we wrote, for example, $O\left(10000000n^2 + 10^{1000}\right)$ it would tell us the same and would be more informative. What makes us use $O\left(n^2\right)$ instead?

You might say "you throw away all coefficients because if $\left(n\right)$ gets really big, then those coefficients don't matter", but I in my head they do matter: no matter how big $\left(n^2\right)$ is, but $\left(10000000n^2\right)$ is $10000000$ times bigger. So, why do we throw it away? Because it's not important? But who tells us what is important and what isn't?

I am sure I don't get something essential about all this idea of Big O, but I am desperate to find out what exactly.

3

There are 3 best solutions below

1
On BEST ANSWER

Certainly big-O is not a perfect tool, but who is perfect? The main idea of ​​creating is to hide/suppress insignificant information in comparison with more significant and create a classification.

Big-O is class, a set of functions i.e. the method of classifying the so-called "speed", the "order" of growth. In fact, this is an abstraction. $10n^2$ and $10^6n^2$ are in the same class, because they have the same "growth rate" which we designate by $O(n^2)$. Dropping the constant in multiplication allows to scale different $10n^2$ and $10^6n^2$ functions and put them in one class.

The second factor of abstraction is taking the neighborhood of infinity, i.e. again we single out a certain property and call it the main one in comparison with another. In a particular case, this may even be erroneous, since today's computers only work for bounded values.

Let me repeat, that this approach has many drawbacks for one given specific case, but any abstraction, any classification that replaces a specific object with an abstract one always loses the specific properties of this particular object.

2
On

The idea is actually a very practical one: what exactly are these "steps" we are counting?

Suppose that we write out the algorithm in our favourite programming language, say Python, and then we decide that a "step" is a given line in our algorithm. We then look at the loops, think about the problem carefully, and decide that this algorithm will take (in the worst case, say) $10n^2+50$ steps (where $n$ is some completely explained parameter relating to the size of the input). What would happen if we ran this?

Well, in reality the "if" statements and the "while" statements and the "for" statements might actually compile to being a number of different commands in Assembly Language (the low-level language that your computer uses at its base level) and the number of these steps might differ (e.g. the if statements become 3 fundamental Assembly steps, for loops take 20, while take 50). These numbers might differ again if you write the same algorithm in a different programming language or you update the same programming language or you switch computers (which use different CPUs with different instruction sets) etc.

All of these minor possible variations in your computational device will actually have an effect on the speed of your algorithm. If you actually implement the same algorithm in different programming languages and plot the time against size of problem, it might differ from your "ideal computed number of steps" function $10n^2+50$ but one thing we can all agree on is that the many many kinds of changes will NOT lead a quadratic-time algorithm to becoming a cubic-time algorithm (unless, for example, your programming language fundamentally misrepresents the architecture of your computer).

THIS is precisely the idea: that the variations between implementations should be seen as unimportant somehow and yet we still need to make concrete mathematical statements about these "fuzzy rates of growth". Big O, Big $\Omega$, Big $\Theta$, little $o$ and various other asymptotic notations are there to solve this problem.

===

EDIT: As an afterthought, I feel I should add that this is kind of part of a more general theme in Theoretical Computer Science: we want results about computability that are true regardless of the specific model of computation we use.

In Computational Complexity Theory, we regularly want to discuss results which say that some problem, in a formal sense, is somehow "difficult to compute" and we don't want this to mean that some person's 1990s Dell finds it difficult to solve. We want this result to have some interpretation that tells you that no matter the computer, the problem is difficult.

Similarly, one of the first theorems anyone ever proves in a discussion concerning Kolmogorov Complexity is an invariance theorem which states that any two Turing machines' Kolmogorov complexity functions are necessarily the same up to some constant difference - this difference is something we're not supposed to care about and tells us that results can be proven about this concept without worrying about the specific computer architecture we're considering.

0
On

Let me try to give you a different perspective on this. I'm a Site Reliability Engineer. My day job is managing Google's production systems. One of my responsibilities is provisioning and capacity management - that is, my colleagues and I decide how much hardware we need to run our services, and where we need it.

In order to do this, we need to know, calculate, or empirically evaluate all of the following pieces of information:

  1. How many requests (per second) will we need to process?
  2. How big will each request be?
  3. How much does a request cost to process, in terms of how big it is?

For the most part, we evaluate all three of these questions through measurement and load testing - not with Big O. However, sometimes we also need to consider:

  1. Are all requests the same size, and if not, what happens to really big requests?

If our service is $O(1)$, then the answer is "We process them exactly the same as the small requests." If our service is $O(N)$, then the answer is "Each large request is equivalent* to some number of small requests based on its size, so we just need to increase the overall number of requests that we can process, based on the distribution of request sizes that we observe." But if our service is slower than $O(N)$, then we start getting concerned about this question. For example, a $O(N^2)$ service might slow to a crawl if you throw a small number of really big requests at it, whereas it could easily handle a large number of small requests. So now we need to test the service with large requests, which might not have been strictly necessary for a $O(N)$ service, and explicitly account for the overall distribution of request sizes in a much more careful and rigorous fashion than for a $O(N)$ service. Alternatively, we might need to redesign the service, stick a caching layer in front of it, or apply some other mitigation so that we can "pretend" the service is $O(N)$ or faster.

The coefficients are, frankly, not very useful in this context. We're already measuring that information empirically anyway. You might wonder if we could somehow theoretically predict the results of our load tests and benchmarks, and thereby avoid doing them, but in practice, this turns out to be very difficult due to the high complexity of distributed systems. You would need to accurately model and characterize the performance of a lot of moving parts, most of which were not explicitly designed to provide real-time guarantees of performance. Given that empirical load testing is a straightforward and standardized process, it's simply infeasible to try and do this theoretically.


* A linear function may have a non-zero y-intercept, which makes this equivalence technically false. But in most cases, the y-intercept is not large enough for this to make a practical difference. If it is large enough to matter, then we account for it as a separate component of the overall cost. Similarly, very few services are actually $O(1)$, because at an absolute minimum, you need to unmarshal the request, which is itself a $O(N)$ operation, but in practice, this is very cheap and may not be worth accounting for.