The main idea behind Big O notation

1.7k Views Asked by At

Well, when we use Big O notation we never know even approximate number of steps of a given algorithm, right? For example, if we have $O(n)$ algorithm, then we don't know how fast this algorithm itself is: its exact number of steps may be $10n$ or even $10^{100}n$. Thus we just put this algorithm in particular "class of functions" $O(n)$.

The reason for doing that is for us to be able to compare algorithms from different "classes". For example, we always know that $O(n)$ is going to be slower than $O(\log_2 n)$ for large $n$, because $\lim_{n \to +\infty} \frac{\log_2 n}{n} = 0$.

But if we are told that there are two different $O(n)$ algorithms, then we can't say which one is going to be faster, right? Moreover, if we are given a single $O(n)$ algorithm, we can't say how fast it is (because it may take $10n$ or even $10^{100}n$ steps and still be $O(n)$).

Is it the main and the only idea behind Big O notation? Or is there something else that I missed?

I've already googled it million times and still have this confusion in my head. So, please, check my understanding and say if anything is wrong. Thanks in advance!

3

There are 3 best solutions below

5
On BEST ANSWER

Let me repeat for new participants in the discussion, the definition, for non negative case, of a big-$O$

$$O(f) = \Big\{g\colon \exists C >0, \exists N \in \mathbb{N}, \forall n \Big(n>N \Rightarrow g(n) \leqslant C\cdot f(n)\Big) \Big \}$$

less formally we can say, that $O(f)$ is set of functions for which elements exists constant $C>0$, dependent on $g$ such, that $C\cdot f$ dominates $g$ in some, again dependent on $g$, neighborhood of infinity.

Now back to posed questions:

when we use Big O notation we never know even approximate number of steps of a given algorithm, right?

in general this is not true, because if we take $f=0$, then unique member of $O(f)$ will be again $g=0$ and here we exactly know, that number of steps there are zero.

But this is utmost and, possibly unique, case and let me below not take it in consideration. Even with $O(1)$ we can tell about members, only that they are bounded. Being in $O(f)$ gives not exact numbers of steps, but scaled upper bound in neighborhood of infinity.

we always know that $O(n)$ is going to be slower than $O(\log_2 n)$

This is a very common misconception. Because $\log_2 n < n, n \in \mathbb{N}$, then $O(\log_2 n) \subset O(n)$. This gives, that, in $O(n)$ are elements faster, then $\log_2 n$. For example, element of $O(n)$ is any bounded function, which are faster then $\log_2 n$. So, We can say, that in $O(n)$ are elements slower, then $\log_2 n$, but there are faster elements also and so we cannot say, that $O(n)$ is slower than $O(\log_2 n)$. What we can say is that all elements from $O(\log_2 n)$ are faster, then $f(n)=n$ itself, not then $O(n)$ members.

So property between borders of big-Os, I prefer call them parents, $\log_2 n$ and $n$ $\lim\limits_{n \to +\infty} \frac{\log_2 n}{n} = 0$ cannot be transferred to members of $O(\log_2 n)$ and $O(n)$. Limit $\lim\limits_{n \to +\infty} \frac{O(\log_2 n)}{O(n)}$ have no sense and is not defined in same sense as $O(\log_2 n) + O(n) = O(\log_2 n + n) = O(n)$.

if we are told that there are two different $O(n)$ algorithms, then we can't say which one is going to be faster, right?

Right.

Is it the main and the only idea behind Big O notation?

I can say nothing about the main and only one. Proceeding from the task, probably, everyone should decide for him/her/self what is the main . The most important thing, imho, is to get the right results and meet people who can and will evaluate what has been done.

Answers to questions in comments.

suppose you have an algorithm that takes O(n) steps. How do you know an approximate number of steps that this algorithm takes?

I wrote above, that if $f\ne 0$, then for any $g$ being in $O(f)$, i.e. only fact $g \in O(f)$, does not give exact numbers of steps for $g$, but gives scaled upper bound of $g$ in neighborhood of infinity. So what we know about $g$ from fact $g \in O(f)$? Only that it is no worse than $f$. So, if algorithms complexity function $g$ is in $O(n) $, then we know, that approximately number of steps for $g$ is less, then scaled $n$.

how then are we supposed to say for sure what its worst case behavior is: $O(n^2)$ or $O(n)$?

let's separate behavior for best, worst, or average cases from belonging to class and consider pure $f$, without knowing which case it is. Of course, when we know that $f(n) \in O(n)$, then we have $f(n) \in O(n^k)$ for any $k>1$. If we find one upper bound, then we will have infinite ones. But as more big upper bound we choose, then more rough is estimation. More "far" are we from estimated object. If $f$ is in $O(n)$, then it is also in $O(n^2)$, it is also in $O(n^3)$ and so on, but which one is less rough?

And few words at end about best, worst, or average cases. They are only minimum, maximum and expected values calculated depending on input and everything what is true for general, pure $f$ written above is true for them as well.

Another question is, when knowing big-O for some one from them we want to discuss big-O for others. So, if/when we want to analyze them, or discuss $\Theta(f)$, then, please, set specific questions.

3
On

But if we are told that there are two different $O(n)$ algorithms, then we can't say which one is going to be faster, right?

Yes, this is correct. My late PhD supervisor disliked using these notions for this very reason. A smaller $O(n)$ class means that, for sufficiently complex inputs, the algorithm will be faster, but there's no guarantee how big $n$ would have to be in order to see a difference.

As a concrete example, the AKS primality test showed that primality testing could be done in polynomial time. This is, of course, an impressive achievement in computer science and pure number theory. In practical terms, you're better off using other algorithms, despite the fact that they do not terminate in polynomial time.

7
On

Your understanding is right: big-O notation only shows you how algorithm time and memory use scales for large inputs and doesn't necessarily tell you anything about time or memory use in absolute numbers. For small inputs, simpler algorithms with worse asymptotic performance may be faster in practice. For instance, multiplying two $n \times n$ matrices using the straightforward method takes $O(n^3)$ time while fancier algorithms can do better (the current best known algorithm is $O(n^{2.373})$). But in practice, the straightforward method is faster for inputs of less than a few dozen rows.

In practice, though, an $O(n \log n)$ solution to some problem will usually be meaningfully faster than an $O(n^2)$ solution for problem sizes small enough to show up in the real world (which, in turn, would be meaningfully faster than an $O(n^3)$ or $O(n 2^n)$ or $O(n!)$ solution). But this isn't necessarily true, especially if we're discussing worst-case rather than average-case performance. The standard method for solving linear programming, the simplex method, has worst-case exponential time but is fast in practice, and quicksort (with $O(n^2)$ worst-case performance) is often faster than other sorting algorithms with guaranteed $O(n \log n)$ performance.