Visualize $T(n)=2T(n/2)+O(n)=O(n\log(n))$ on a Tree

597 Views Asked by At

I understand the mathematical proof for $T(n)=2T(n/2)+O(n)=O(n\log(n))$, however I cannot visually wrap my head around how it works. Intuitively, it just feels like it should $O(n^2)$. Can you show me why these two trains of thought are wrong:

  1. Imagine a binary tree with $n$ nodes. On each node an $O(n)$ operation needs to be executed. A recursive function for this would thus be $T(n)=O(n)+T(\text{left})+T(\text{right})=2T(n/2)+O(n)$. Since there are $n$ nodes and the operation is $O(n)$ time to do thus $T(n)=nO(n)=O(n^2)$.
  2. If you took this binary tree and simplified the operation to be $O(1)$, the complexity for this new tree would be $T(n)/O(n)=O(n\log(n))/O(n)=O(\log(n))$ (since things are now less complex by a factor of $O(n)$). However, it is mathematically proven that $T(n)=2T(n/2)+O(1)=O(n)$, not $O(\log(n))$.

Note: I understand the mathematical proof for why $T(n)=2T(n/2)+O(n)=O(n\log(n))$ works. However I cannot visualize this or grasp it intuitively.

1

There are 1 best solutions below

3
On BEST ANSWER

I think you're misinterpreting what the master theorem describes. The task here is to process a tree, which is not exactly the same as just processing the nodes. Sorry for the pun, but it's like you're missing the tree in the forest behind the nodes. :-)

On each node an $O(n)$ operation needs to be executed.

That's wrong — exactly because the real task is to process a tree, not just individual nodes one at a time. That's why the number of operations is NOT the same at different nodes. For example, simply by the definition of the function $T$: the top node requires $T(n)$ operations — because it's the root of a tree of size $n$ that we need to process; but each leaf, being a tree with a single node, requires only $T(1)$ operations, which is a constant.

Since there are $n$ nodes and the operation is $O(n)$ time to do thus $T(n)=nO(n)=O(n^2)$.

Wrong, as explained above — different nodes require different time.

If you took this binary tree and simplified the operation to be $O(1)$, …

Same thing.

… the complexity for this new tree would be $T(n)/O(n)=O(n\log(n))/O(n)=O(\log(n))$ …

Sorry, but this simply doesn't make any sense. You started this calculation with the assumption that $T(n)=O(n\log(n))$, since that's what you replaced $T(n)$ with. Let's even leave aside the question of why you assume that from the beginning if that's what we're trying to deduce (even though it is a serious logical error). But if you already know what $T(n)$ is, how much sense does it make to "deduce" something different for it?


What the master theorem describes is the divide-and-conquer approach. Your main misconception is that something the same happens at each node, but that's not true. In fact, there are three things that happen at each node (for a binary tree):

  1. We need to process the left subtree, at a cost of $T(\text{left})$;
  2. We need to process the right subtree, at a cost of $T(\text{right})$;
  3. And we need some time to perform additional operations at the node, such as forming subtrees, putting the results from the subtrees together, etc.

The "$+O(n)$" term in the recursive formula in your example refers to the computational cost of the last part only — for a node which is the root of a (sub)tree of size $n$; it is NOT the entire cost of what happens at a node, and it is NOT the same for all nodes. The total number of operations that we have to perform at such a node is by definition denoted $T(n)$, and it's in fact expressed by the recursive relation $$T(n)=O(n)+T(\text{left})+T(\text{right})=2T(n/2)+O(n).$$