For two letters $X$,$Y$ one can form $2^n$ words (arbitrary sequences composed of the letters $X,Y$) such that none of these words is an initial segment of any other of these words.
An initial segment of a word can be of any length, from a single first letter, to the whole word. Show that the sum of lengths of these words is at least $2^nn$.
PS:
I have tried using Huffman Tree https://en.wikipedia.org/wiki/Huffman_coding as my understanding is that the question is asking is that if a huffman tree has n nodes, the sum of the length of the encoding is >= n*2^n.
However I have not had any luck using this route. I am wondering if I should look to proof by contradiction and/or Induction Method to make another attempt at it.
Any thoughts or leads would be really helpful!
This is actually a special case of a more general result: if $C$ is a set of $m$ words over the alphabet $\{X,Y\}$ such that no word in $C$ is an initial segment of any other word in $C$, then the total length of the words in $C$ is at least $m\lg m$, where $\lg m=\log_2m$ is the binary (base two) logarithm of $m$. This evidently reduces to the desired result when $m=2^n$. I may be missing something simple, but I don’t see a proof of the special case in the question that is any simpler than proofs of the general result. I came up with two proofs of the general result, one using the Kraft-McMillan inequality and one using more elementary ideas.
For convenience let $\ell(w)$ be the length of $w$ for each word $w$ over $\{X,Y\}$, so that the result is that $\sum\limits_{w\in C}\ell(w)\ge m\lg m$.
The Kraft-McMillan inequality says that
$$\sum_{w\in C}2^{-\ell(w)}\le 1\,,$$
and multiplying by $\frac{m}{\sum_{w\in C}2^{-\ell(w)}}$ yields the inequality
$$m\le\frac{m}{\sum\limits_{w\in C}\frac1{2^{\ell(w)}}}\,.\tag{1}$$
The righthand side of $(1)$ is the harmonic mean of the numbers $2^{\ell(w)}$ for $w\in C$, and the harmonic mean of a set of positive real numbers is less than or equal to the geometric mean of those numbers, so
$$m\le\left(\prod_{w\in C}2^{\ell(w)}\right)^{1/m}=2^{\frac1m\sum_{w\in C}\ell(w)}\,.$$
Now take binary logs on both sides:
$$\lg m\le\frac1m\sum_{w\in C}\ell(w)\,,\tag{2}$$
so
$$\sum_{w\in C}\ell(w)\ge m\lg m\,.$$
If the Kraft-McMillan inequality isn’t available, one can prove the result by induction on $m$. You can easily verify that it holds for $m=1$ and $m=2$. Now assume that the result holds for all such sets of fewer than $m$ words. Arrange the words over the alphabet $\{X,Y\}$ in an infinite binary tree $T$:
The left child of a word $w$ is $wX$, and the right child is $wY$. Observe that a word $u$ is a prefix of a word $w$ if and only if the vertex $u$ is on the path from the root $*$ to the word $w$. Thus, the $m$ words must lie on $m$ distinct branches of $T$. If $m=2^n$, for instance, they could (but need not) be the $2^n$ words on Level $n$, in which case their total length would be exactly $n2^n$.
Let $C_X=\{w:Xw\in C\}$ and $C_Y=\{w:Yw\in C\}$; clearly no word in $C_X$ is an initial segment of any other word in $C_X$, and similarly for $C_Y$. Let $m_X=|C_X|$ and $m_Y=|C_Y|$, and assume for now that $m_X\ne 0\ne m_Y$. The induction hypothesis then ensures that
$$\sum_{w\in C_X}\ell(w)\ge m_X\lg m_X$$
and
$$\sum_{w\in C_Y}\ell(w)\ge m_Y\lg m_Y\,.$$
Now $m_X+m_Y=m$, so
$$\begin{align*} \sum_{w\in C}\ell(w)&=\sum_{w\in C_X}\big(1+\ell(w)\big)+\sum_{w\in C_Y}\big(1+\ell(w)\big)\\ &=m_X+\sum_{w\in C_X}\ell(w)+m_Y+\sum_{w\in C_Y}\ell(w)\\ &=m+\sum_{w\in C_X}\ell(w)+m_Y+\sum_{w\in C_Y}\ell(w)\\ &\ge m+m_X\lg m_X+m_Y\lg m_Y\\ &=m+m_X\lg m_X+(m-m_X)\lg(m-m_X)\,. \end{align*}$$
It’s an easy exercise in basic calculus to determine that the function
$$f(x)=x\lg x+(a-x)\lg(a-x)$$
has a minimum at $x=\frac{a}2$, so
$$m_X\lg m_X+(m-m_X)\lg(m-m_X)\ge m\lg\frac{m}2=m(\lg m-1)\,,$$
and
$$\sum_{w\in C}\ell(w)\ge m+m(\lg m-1)=m\lg m\,,$$
as desired.
What if one of $m_X$ and $m_Y$ is $0$? In that case the words in $C$ all share a non-empty maximal common initial segment $u$, and since $u$ is maximal, both $uX$ and $uY$ are initial segments of at least one word in $C$. Let $C_0=\{w:uw\in C\}$, the set of words obtained by removing the common initial segment $u$ from the words in $C$. Then
$$\begin{align*} \sum_{w\in C}\ell(w)&=m\ell(u)+\sum_{w\in C_0}\ell(w)\\ &\ge m\ell(u)+m\lg m\\ &>m\lg m\,. \end{align*}$$
The result now follows by induction.