I'm reading about Huffman coding, more precisely, the proof that it is an optimal code. Most proofs I found assume an existence of an optimal code (without justifying the existence). This is pointed out at the beginning of the video followed by a proof of the above fact.
In the video, I didn't understand why there exists $i$ such that $\sup l_i^{(n)}=\infty$, so I tried to prove the statement myself.
My question is whether the following "proof" is correct:
Let $\mathcal C=\{C: C\text{ is a uniquely decodable code}\}$ and let $\mathcal L=\{L_C: L_C \text{ is the expected length of the code }C \in\mathcal C\}$. Then $\mathcal L$ is a non-empty set of positive integers, so by the well-ordering principle, it has a least element. The least element must correspond to the optimal code (by definition of optimality).
Well, the expected length need not be an integer: you're averaging, and there's plenty of laws over integers which have a noninteger mean (e.g., the Bernoulli laws). For a nonempty set of reals (which $\mathcal{L}$ is because the Huffman code is UD), with a lower bound (which $\mathcal{L}$ has since length is nonnegative), there \emph{is} an infimum for the set, but this doesn't mean that the infimum is achieved, e.g., for $S = (0,1),$ $\inf_{x \in S} x = 0,$ but of course no $x \in S$ is actually $0$, so this infimum is not achieved in $S$. It is true, however, that in such a case, there is always a sequence of numbers in the set that approaches the infimum in the limit.
That's the point of the question in the video: it could be that there is some alphabet $\mathcal{X}$, and message distribution $p$ such that no code actually achieves the infimum, i.e. for any code $C, L_C > \inf \mathcal{L}$.
I do agree that the video skips the key point, why must there be a symbol for which the length explodes. To see this, suppose for the sake of contradiction it were the case that there was some maximum length $\ell_\max$ such that for any $C^{(n)},$ and any symbol $x \in \mathcal{X},$ $\sup_{C^{(n)}} \ell_x^n \le \ell_{\max}$, where $\ell_x^n$ is the length of the codeword for $x$ in $C^{(n)}$. But then notice that there are at most $2^{\ell_{max} |\mathcal{X}|}$ such codes in total (for each $x$, we have only $\le 2^{\ell_{\max}}$ choices for codeword, and there are only $|\mathcal{X}|$ messages). But this means that we cannot have an infinite sequence of codes $C^{(n)}$ such that $L^{(n)}$ is monotonically decreasing (since we only have a finite possible choices of codes, and so eventually the $L^{(n)}$ would have to stop decreasing and be equal to the previous one instead). This contradicts the assumption in the video $L^{(1)} > L^{(2)} > \cdots,$ and we're done.