I need to prove the following in coding theory:
For a set $\mathcal{X}$, the type of a sequence $x_1^n = (x_1,\ldots, x_n) \in \mathcal{X}^n$ is its empirical distribution $\hat{P}=\hat{P}_{x_1^n}$, i.e. the distribution defined by $\hat{P}(a) = \frac{|\{i\colon x_i = a\}|}{n}$. A distribution $P$ on $\mathcal{X}$ is called an $n$-type if it is the type of some $x_1^n\in \mathcal{X}^n$. The set of all sequences $x_1^n \in \mathcal{X}^n$ of type $P$ is called the type class of the $n$-type $P$, and is denoted by $\mathcal{T}_P^n$. Let $A_n \subset \mathcal{X}^n$ be the union of all type classes $\mathcal{T}_P^n$ with $H(P) \leq R$, where $H$ is the entropy. Then the following hold:
- $\frac{1}{n} \log |A_n| \to R$ as $n\to\infty$ ($\log$ here is the base 2 logarithm).
- $Q^n(A_n) \to 1$ as $n\to\infty$ for every distribution $Q$ on $\mathcal{X}$ with $H(Q) < R$
For clarity, for any distribution $P$ on $\mathcal{X}$, $P^n$ denotes the distribution of $n$ independent drawings from $P$, that is, $P^n(x_1^n) = P(x_1)\ldots P(x_n)$. Also, for any $A\subset\mathcal{X}$, $P(A)=\sum\limits_{a\in A} P(a)$.
For 1), I want to use the fact (Lemma 2.2 here) that for any $n$-type $P$, we have $$ {n + |\mathcal{X}|-1 \choose |\mathcal{X}| -1}^{-1} 2^{nH(P)} \leq |\mathcal{T}_P^n| \leq 2^{nH(P)} $$ but I fail to see how the $H(P)$-s less than $R$ are going to converge to $R$.
For 2) I think the following fact (Lemma 2.3 here) could help, but I don't know how to make the proof exact:
For any distribution $P$ and any $n$-type $Q$ we have
$$ {n + |\mathcal{X}|-1 \choose |\mathcal{X}| -1}^{-1} 2^{-nD(Q||P)}\leq P^n( \mathcal{T}_Q^n) \leq 2^{-nD(Q||P)} $$
The first question relies on the following observations:
a) The total number of $n$-types is 'small', i.e., bounded as $(n+1)^{|\mathcal{X}|}$. Indeed, the number of $n$-types is the number of nonnegative solutions to $$ n_1 + n_2 + \cdots + n_{|\mathcal{X}|} = n,$$ which works out, by stars and bars, to $\binom{n + |\mathcal{X}|-1}{|\mathcal{X}|-1}$ (this is also the origin of the inverse factor in the inequalities you have written). The key point is that this quantity grows polynomially in $n.$
b) For any distribution $P$, there exists an $n$-type $P_n$ such that $\|P - P_n\|_1 \le |\mathcal{X}|/n$. I'll leave it to you to prove this statement.
c) Entropy is continuous in $P$ (w.r.t. the $\ell_1$-topology, say), and so for $\varepsilon > 0,$ there exists $\delta$ such that for any $Q: \|P-Q\|_1 \le \delta, |H(Q) - H(P)| < \varepsilon$.
Now, to work out 1), first note that \begin{align*} |A_n| &= \sum_{n\textrm{-types } P : H(P) \le R } |\mathcal{T}_P^n|\\ &\le \sum_{n\textrm{-types } P : H(P) \le R} 2^{n R} \\ &\le \sum_{n\textrm{-types } } 2^{nR} \\ &\le (n+1)^{|\mathcal{X}|}2^{nR}, \end{align*} and so $ \frac1n \log|A_n| \le R + O(\log(n)/n),$ which shows that $\limsup \frac1n \log|A_n| \le R$.
For the lower bound, let $R^*_n := \max_{n\textrm{-types } P} H(P).$ Then using the lower bound you have, we know that $\frac1n \log|A_n| \ge R^*_n - O(\log(n )/n).$ So, we would conclude that $\liminf \frac1n \log|A_n| \ge R$ if we can argue that $R_n^* \to R$. This follows from b) and c): I'll again leave it to you to complete the argument.
For 2), the ideas are similar. Clearly it suffices to show that for $B_n := \mathcal{X}\setminus A_n = \bigcup_{n\textrm{-types} P : H(P) > R} \mathcal{T}^n_P,$ it holds that $Q^n(B_n) \to 0$. But by the inequality you have, we know that $$ Q^n(B_n) = \sum_{n\textrm{-types} P : H(P) > R} Q^n(\mathcal{T}^n_P) \le (n+1)^{|\mathcal{X}|} \max_{n\textrm{-types} P : H(P) > R} 2^{-n D(P\|Q)}.$$
Now, the key point is that since $H(Q) < R,$ for any $P$ such that $H(P) > R, D(P\|Q) > 0$. This in fact also follows from the continuity of $H(P)$: since the entropy difference is nonzero, $\|P-Q\|$ must be lower bounded, which in turn means that $D(P\|Q)$ must be lower bounded (note that this would break if instead we had assumed that $H(Q) = R$, think about why). Again I'll leave it to you to show all this formally.
In any case, then, we have an upper bound on $1-Q^n(A_n)$ of the form $(n+1)^{|\mathcal{X}|} 2^{-n \varepsilon}$ for some $\varepsilon > 0,$ which decays to $0$ with $n$.