Well, when we use Big O notation we never know even approximate number of steps of a given algorithm, right? For example, if we have $O(n)$ algorithm, then we don't know how fast this algorithm itself is: its exact number of steps may be $10n$ or even $10^{100}n$. Thus we just put this algorithm in particular "class of functions" $O(n)$.
The reason for doing that is for us to be able to compare algorithms from different "classes". For example, we always know that $O(n)$ is going to be slower than $O(\log_2 n)$ for large $n$, because $\lim_{n \to +\infty} \frac{\log_2 n}{n} = 0$.
But if we are told that there are two different $O(n)$ algorithms, then we can't say which one is going to be faster, right? Moreover, if we are given a single $O(n)$ algorithm, we can't say how fast it is (because it may take $10n$ or even $10^{100}n$ steps and still be $O(n)$).
Is it the main and the only idea behind Big O notation? Or is there something else that I missed?
I've already googled it million times and still have this confusion in my head. So, please, check my understanding and say if anything is wrong. Thanks in advance!
Let me repeat for new participants in the discussion, the definition, for non negative case, of a big-$O$
$$O(f) = \Big\{g\colon \exists C >0, \exists N \in \mathbb{N}, \forall n \Big(n>N \Rightarrow g(n) \leqslant C\cdot f(n)\Big) \Big \}$$
less formally we can say, that $O(f)$ is set of functions for which elements exists constant $C>0$, dependent on $g$ such, that $C\cdot f$ dominates $g$ in some, again dependent on $g$, neighborhood of infinity.
Now back to posed questions:
in general this is not true, because if we take $f=0$, then unique member of $O(f)$ will be again $g=0$ and here we exactly know, that number of steps there are zero.
But this is utmost and, possibly unique, case and let me below not take it in consideration. Even with $O(1)$ we can tell about members, only that they are bounded. Being in $O(f)$ gives not exact numbers of steps, but scaled upper bound in neighborhood of infinity.
This is a very common misconception. Because $\log_2 n < n, n \in \mathbb{N}$, then $O(\log_2 n) \subset O(n)$. This gives, that, in $O(n)$ are elements faster, then $\log_2 n$. For example, element of $O(n)$ is any bounded function, which are faster then $\log_2 n$. So, We can say, that in $O(n)$ are elements slower, then $\log_2 n$, but there are faster elements also and so we cannot say, that $O(n)$ is slower than $O(\log_2 n)$. What we can say is that all elements from $O(\log_2 n)$ are faster, then $f(n)=n$ itself, not then $O(n)$ members.
So property between borders of big-Os, I prefer call them parents, $\log_2 n$ and $n$ $\lim\limits_{n \to +\infty} \frac{\log_2 n}{n} = 0$ cannot be transferred to members of $O(\log_2 n)$ and $O(n)$. Limit $\lim\limits_{n \to +\infty} \frac{O(\log_2 n)}{O(n)}$ have no sense and is not defined in same sense as $O(\log_2 n) + O(n) = O(\log_2 n + n) = O(n)$.
Right.
I can say nothing about the main and only one. Proceeding from the task, probably, everyone should decide for him/her/self what is the main . The most important thing, imho, is to get the right results and meet people who can and will evaluate what has been done.
Answers to questions in comments.
I wrote above, that if $f\ne 0$, then for any $g$ being in $O(f)$, i.e. only fact $g \in O(f)$, does not give exact numbers of steps for $g$, but gives scaled upper bound of $g$ in neighborhood of infinity. So what we know about $g$ from fact $g \in O(f)$? Only that it is no worse than $f$. So, if algorithms complexity function $g$ is in $O(n) $, then we know, that approximately number of steps for $g$ is less, then scaled $n$.
let's separate behavior for best, worst, or average cases from belonging to class and consider pure $f$, without knowing which case it is. Of course, when we know that $f(n) \in O(n)$, then we have $f(n) \in O(n^k)$ for any $k>1$. If we find one upper bound, then we will have infinite ones. But as more big upper bound we choose, then more rough is estimation. More "far" are we from estimated object. If $f$ is in $O(n)$, then it is also in $O(n^2)$, it is also in $O(n^3)$ and so on, but which one is less rough?
And few words at end about best, worst, or average cases. They are only minimum, maximum and expected values calculated depending on input and everything what is true for general, pure $f$ written above is true for them as well.
Another question is, when knowing big-O for some one from them we want to discuss big-O for others. So, if/when we want to analyze them, or discuss $\Theta(f)$, then, please, set specific questions.