What's the intuition behind typical sequences?

Question

What's the intuition behind typical sequences?

125 Views Asked by glS At 07 Jun 2025 - 2:30

Given a probability distribution $x\mapsto p(x)$, some integer $n>0$, and some $\epsilon>0$, one defines $\epsilon$-typical sequences as those sequences $\boldsymbol{x}\equiv(x_1,...,x_n)$ $$\left|-\frac1 n \log[p(\boldsymbol{x})] - H(X) \right| < \epsilon,$$ where $p(\boldsymbol{x})\equiv \prod_{k=1}^n p_X(x_k)$.

My rough intuition for typical sequences was that they are the ones containing $np_i$ copies of the $i$-th symbol, for each $i$, and this intuition was backed-up by the fact that $$\binom{n}{np}p^{np} (1-p)^{n(1-p)} \to 1, \qquad n\to\infty, \\ p^{np} (1-p)^{n(1-p)} = 2^{-nH(X)}.$$ However, I recently realise that, for binary sequences, when $p_0=p_1=1/2$, going by the definition of typical sequences, all binary sequences are typical, regardless of $\epsilon$ and $n$, as $p^{np}(1-p)^{n(1-p)}=2^{-n}=2^{-n H(X)}$.

So what gives? If the intution that typical sequences are those with $np_i$ copies of each symbol tenable at all? Does it only fail for balanced probability distributions, or are there other "edge cases" to be mindful of?

Original Q&A

There are 2 best solutions below

**kodlu** · Answer 1 · 2022-04-09 14:44:31

In the maximal entropy case [$p=1/2$ for binary] all sequences are typical, it is a feature of the definition.

If $p$ is far from $1/2$ say in $(0,1/4)$ then there are way fewer sequences and you have compressibility, after all with high probability you only have to encode $2^{nH(p)}$ sequences, the rest will occur with tiny probability $\epsilon$, say, and you just use a header symbol and don't encode those. This will make little difference to the expected length of your source code.

And yes, your intuition is correct in that.

You can define what is called "strong typicality" directly via empirical symbol proportions being very close to $n p_i.$
The definition given in your question is sometimes called "weak typicality" but is easier to handle mathematically, since we want to work with the entropy function directly without manipulating the multinomial distribution each time which is messy.

However you can prove stronger results using strong typicality.

Edit: See, for example, section 10.6 in Cover and Thomas' book Elements of Information Theory (2nd ed.) entitled Strongly Typical Sequences and Rate Distortion. Essentially tight two sided (both lower and upper) convergence bounds can be proved for quantities of interest. I copied one result below:

**leonbloy** · Answer 2 · 2022-04-09 22:40:05

My rough intuition for typical sequences was that they are the ones containing $n p_i$ copies of the $i-$th symbol

That would be a different kind of typicality, where the empirical frequencies (samples) are near the real frequencies (probabilities). But we don't require that, we requre that the "empirical" or sample entropy $\frac1 n \sum \log(1/p(x_i))$ is near the real entropy $H(X)= E[\log(p(X))]$.

Related: In Cover and Thomas textbook, in exercise 3.4 is studied some relationship between two typical sets: the sequences that are typical with respect to the entropy and another with respect to the mean.

What's the intuition behind typical sequences?

There are 2 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Related Questions in TYPICAL-SEQUENCES

Trending Questions

Popular # Hahtags

Popular Questions