$f(n) \in O\left(n^2\right)$. Why do you think you know how $f(n)$ scales?

790 Views Asked by At

This is my fifth question in an attempt to figure out what $f(n) \in O\left(n^2\right)$ actually means. I'll try to make it as simple as possible.

Facts I'm already aware of:

  1. Definition of Big O.
  2. If we have two functions from different Big O sets, we can figure out which one is going to eventually be bigger.

What we're given (in my question):

  1. We're given some algorithm that takes $f(n)$ steps in the worst case. And $f(n) \in O\left(n^2\right)$.

Preliminaries:

  1. I showed visually in What does the word "scalability" mean in terms of Big O? that you never know how exactly function $f(n)$ scales in the sense that the ratio $\frac{f(2n)}{f(n)}$ can differ significantly from $\frac{\left(2n\right)^2}{n^2}$ for any value of $n$ (no matter how large $n$ is) despite the fact that $f(n) \in O\left(n^2\right)$.
  2. In simple words it just means that the statement: "Double the input size $\implies$ quadruple the runtime" can't be taken on faith just from knowing that $f(n) \in O\left(n^2\right)$.

We often encounter statements like these:

  1. Function $f(n) \in O\left(n^2\right)$ grows / scales quadratically.
  2. For function $f(n) \in O\left(n^2\right)$ we know that if we double its input size, then its runtime is going to quadruple (assuming $n$ is large enough).

Question:

We already showed that you can never know how your algorithm's runtime will increase if you double (or just change somehow) the input size. Then what do statements like these mean? Are they referring to the function $f(n)$ itself or maybe to its upper bound? Here's what I mean:

  1. Do they (statements) assume that $f(n) = kn^2$ for some value of $k$ and then talk about how $f(n) = kn^2$ scales? Meaning, instead of finding the ratio of $\frac{f(2n)}{f(n)}$ you can just find the ratio $\frac{\left(2n\right)^2}{n^2} = 4$ and then you can say: "Double the input size of your algorithm $\implies$ quadruple the runtime". But then it's just an approximation and, moreover, it can be very wrong, since we know nothing about $f(n)$ itself except for the fact that $f(n) \in O\left(n^2\right)$. If that's the case, then why do people trust this approximation if they don't even know how wrong or right it is?
  2. My second guess (that might be useless though) is that statements like these refer to the upper bound itself. I mean, since we know that $f(n) \in O\left(n^2\right)$, then we know that eventually $f(n) \leqslant Cn^2$ for some constant $C$. From here we can talk about how the upper bound $Cn^2$ scales. If that's the case, then I don't understand why it even makes sense to talk about how the upper bound scales?

I'm not asking why people write such statements. I'm just asking what they mean exactly. Though I admit some of these statements might be wrong. Whatever. I just want to understand it right. Thank you in advance!

3

There are 3 best solutions below

3
On

In my limited experience in competitive programming, people often write $O(n^2)$ even if they actually know that the runtime is $f(n) \sim Cn^2$ for some $C > 0$, where $g(n) \sim h(n)$ means $\lim_{n \to \infty}\frac{g(n)}{h(n)} = 1$. In this case, $f(2n) \sim 4Cn^2$ so $\frac{f(2n)}{f(n)} \sim 4$. But you are right that if you interpret $O$ literally, then the bound might not be sharp enough to conclude that $\frac{f(2n)}{f(n)} \sim 4$.

13
On

When we say that $f(x)\in O(x^2)$ we are already talking about a worst case scenario, since we are bounding the maximum value $f(x)$ can take. We are not interested on the exact time or the number of steps that an algorithm will take for each set of inputs, but we want to make sure that in the worst case (or the average case) our algorithm will be constrained. So, answering your question, scalability means: if we double the size of the input, then the runtime will be no more than 4 times the runtime of the worst case input for the original $n$.

In this sense, you are right about your second guess. It makes sense to talk about how the upper bound scales since the upper bound is a guarantee on the maximum number of steps.

Edit: After some discussion, I think I'm prepared to improve the answer.

First of all, let us assume that $f(x)$ is everywhere positive (to avoid division by 0), and that $f(x)\leq C x^2,$ so that $f(x)\in O(x^2).$

What does the big-Oh notation doesn't imply

The big-Oh notation doesn't tell us anything about $\frac{f(nx)}{f(x)},$ for any $x$, since $f(x)$ can have a very erratic behavior.

What does the big-Oh notation does imply

The big-Oh notation implies that $\left\{\frac{f(nx)}{n^2f(x)} \right\}$ is bounded above by a constant that is only dependent on the value of $f(x)$ and of $x$ itself, which is $\frac{Cx^2}{f(x)}.$ Indeed, notice that $\frac{f(nx)}{n^2f(x)}\leq \frac{C(nx)^2}{n^2f(x)}=\frac{Cx^2}{f(x)}.$

Thus, with respect to a certain pair $x,n,$ the big-Oh notation does not provide any information about the scalability of the original function $f,$ as the value of $\frac{f(nx)}{f(x)}$ can be arbitrary. But in general, we know that this value is bounded above by $Cn^2\frac{x^2}{f(x)},$ so we can't say that if you double the size of the input you will get 4 times the runtime, but we can say that if you double the size of the input, you will get $4k$ times the runtime, where $k$ can be controlled.

6
On

First a few formal facts, afterwards I will give my interpretation. I will always assume that the function $f$ is non-negative.

  1. Let $M\in]0,\infty[$ and let* \begin{split}f:\mathbb N&\to\mathbb R,\\ n&\mapsto n^2(k(M)-(n\mod 2)),\end{split} where $k(M)\overset{\text{Def.}}=\frac{M}{M-1}$. Then we have $f\in\Theta(n^2)$, and for any odd $n\in\mathbb N$ we have $$\frac{f(2n)}{f(n)}=4\frac{k(M)}{k(M)-1}=4 M.$$ But $4M$ can be arbitrarily large (much larger than $4$, for example)!
  2. Indeed as mentioned in an answer to your previous question, we have $$\lim_{k\to\infty} \sqrt[k]{\frac{f(2^k n)}{f(n)}}=4$$ for any $f\in\Theta(n^2)$. (This follows more or less directly from the Definition.) This is no longer true for $f\in O(n^2)$ though. Consider for example $f(n)=n\in O(n^2)$, then the limit is $2$, not $4$.
  3. As mentioned in my comment, we have $$0<\liminf_{n\to\infty}\frac{f(2n)}{f(n)}\le\limsup_{n\to\infty}\frac{f(2n)}{f(n)}<\infty$$ for any $f\in\Theta(n^2)$. This fails once again if $f\in O(n^2)$, as for example $f(n)=\exp(-n)\in O(n^2)$ shows.

My interpretation is this: If you have $f\in O(n^2)$ and people say "doubling the input size quadruples the runtime", what they mean is that whenever you double the input size, then the worst time that you expect for $2n$ is less than the worst time that you expect for $n$. So when they write $f\in O(n^2)$, what they implicitly mean is that "I am prepared for $f$ to scale like $C n^2$ for some $C>0$, and so I am prepared for $f(2n)$ to be at most $4 C n^2$, which is $4$ times worse than the worst time that I am prepared for when it comes to $f(n)$."


* As a fun exercise: Try to write $f$ so that it is defined on all of $\mathbb R_+$, is smooth, and coincides on $\mathbb N$ with the $f$ above. (Hint: Try rescaling something like the cosine to replace $n\mod 2$.)


As a final remark: In practice, it is quite unusual to see erratic complexity functions as the one constructed above.