I am looking at the proof of the following theorem and I have some questions.
The theorem is the following:
On the assumption that all permutations of a sequence of $n$ elements are equally likely to appear as input, any decision tree that sorts $n$ elements has an expected depth of at least $\log n!$.
The proof is the following:
Let $D(T)$ be the sum of the depths of the leaves of a binary rtee $T$.
Let $D(m)$ be the smallest value of $D(T)$ taken over all binary trees $T$ with $m$ leaves.
We shall show, by induction on $m$, that $D(m) \geq m \log m$.
The basis, $m=1$, is trivial.
Now, assume the inductive hypothesis is true for all values of $m$ less than $k$.
Consider a decision tree $T$ with $k$ leaves.
$T$ consists of a root having a left subtree $T_{i}$ with $i$ leaves and a right subtree $T_{k-i}$ with $k-i$ leaves for some $i$, $1 \leq i \leq k$.
Clearly, $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$
Therefore, the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)] \ \ \ \ (*)$$
Invoking the inductive hypothesis, we obtain from $(*)$ $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i+(k-i \log (k-i)]$$
It is easy to show that the minimum occurs at $i=k/2$. Thus $$D(k) \geq k+k \log \frac{K}{2}=k \log k$$
We conclude that $D(m) \geq m \log m$ for all $m \geq 1$.
Now we claim that a decision tree $T$ sorting $n$ random elements has at least $n!$ leaves.
Moreover, exactly $n!$ leaves will have probability $1/n!$ each, and the remaining leaves will have probability zero.
We may remove from $T$ all vertices that are ancestors only of leaves of probability zero, without changing the expected depth of $T$.
We are thus left with a tree $T'$ having $n!$ leaves each of probability $1/n!$.
Since $D(T') \geq n! \log n!$, the expected depth of $T'$ (and hence of$T$) is at least $(1/n!)n! \log n!=\log n!$.
$$$$
My questions are the following:
- Why do we want to show, by induction, that $D(m) \geq m \log m$ ??
- When we consider a decision tree $T$ with $k$ leaves where $T_{i}$ is the left subtree with $i$ leaves and $T_{k-i}$ is the right subtree with $k-i$ leaves, why does it stand that $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$ ??
- How do we conlcude that the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)]$$ ??
- How do we obtain, from the inductive hypothesis, that $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i +(k-i) \log (k-i) ]$$ ??
- Could you expalin me the part after "Now, assume the inductive hypothesis is true for all values of $m$ less than $k$." ??
1) Because you want the expected depth, you need a way to control the depth of the decision tree and by knowing that you will have at least $n!$ leaves(possible orders) it seems fair that you want $\frac{n!log(n!)}{n!}$ to happen.
2) When you glue together the 2 subtrees you are creating a new level. Because of that, every path from a leave to the root must be increased by $1$, there are $k=i+(k-i)$ leaves, then you must increase by $k$ the paths found in $D(T_i)$ and $D(T_{k-i})$.
3) Notice that $D(T)=i+D(T_i)+(k-i)+D(T_{k-i})=k+D(T_i)+D(T_{k-i})$, also see that you are iterating over all possible binary trees with $k$ leaves by splitting it into subtrees with $i$ and $k-i$ leaves.
4) Notice that $k$ is fixed and recall that $D(i)\geq ilog(i)$ and $D(k-i)\geq (k-i)log(k-i)$, also notice that $i$ and $(k-i)$ are smaller than $k$.
5) See Strong induction.
Edit:
2) Yes, i am glad to. But you will understand it better if you do some picture.
Let $C_i:=$binary trees with $i$ leaves. You will have that $C_i=\bigcup _{k=0}^i C_k \times C_{i-k}$, because if you have a binary tree you always can take out the root(call it $r$) and the result will be two trees, the left and the right subtrees with roots $r_r$,$r_l$ respectively. If you do the picture, you will see that the unique path of any leave to the root in any subtree is the same path that the one in the original tree but without the edge that joins the root $r$ to $r_r$ or $r_l$ in the original tree. So, the depth of every single leave in the original tree is one plus the depth in any subtree. So, if you have $k$ leaves, the depth must be increased by $k=i+(k-i)$.
3)Because you always take the minimum, but if you see it in general for every single tree $T$, you will have it for the tree that generates $D(i)$. Think about the $\min$ function.
5) Yes, recall that the binary trees are defined in a recursive way as pointed out in "2)" below. If you have the desired property for every $m<k$, because of that recursive way, you can construct any possible tree with $k$ leaves by gluing the two subtrees using the hypothesis that the problem is solved in the subtrees. When you suposse that for any single $m<k$ you have the problem solved and you use that toconclude the property for $k$ that is strong induction.
Hope it helps.