The book "Probability, Random Processes, and Statistical Analysis" (written by Hisashi Kobayashi and Brian L. Mark and William Turin), talks about the role of entropy in characterising typical sequences (page 257). It says that in a coin tossing experiment:
When we change the experiment of fair coin tossing to that of an unfair coin that lands on head with probability $β$, then a sequence that contains $nβ$ heads becomes a typical sequence, and the number of such typical sequences approaches $2^{nH(β)}$ for large $n$. Each typical sequence will occur with probability $2^{−nH(β)}$. There are $2^n$ possible distinct sequences. The difference $2^n − 2^{nH(β)}$ is the number of nontypical sequences, and the total probability of such sequences becomes negligibly small for sufficiently large $n$.
where $H(\beta)=-\beta \log_2\beta-(1-\beta)\log_2(1-\beta)$.
However if I calculate the probability of a nontypical sequence as number of favourable events over the number of possible events, I have: \begin{equation} P(\mathrm{nontypical})=\frac{2^n − 2^{nH(β)}}{2^n}=1-2^{n(H(β)-1)} \end{equation} which goes to $1$ as $n\to\infty$ since $0\leqslant H(β)\leqslant1$ with the logarithms understood in base $2$.
What's wrong with my calculation? It directly contradicts the Authors and this sounds suspicious :-)
I don't much like the quoted paragraph.
First: when we say that a sequence is "typical", we must say with "typical with respect of what". It's typical with respect to the mean if the sample mean equals (approximately) the true mean. It's typical with respect to the entropy if the "sample entropy" $-\log(p(X^{[n]})/n$ equals (approximately) the true entropy $H(X)$. The phrase
confuses both concepts. That would apply to the "typical with respect to the mean" concept - but we are speaking of "typical with respect to the entropy" here. (The difference is most evident in the fair case: here, all sequences are typical with respect to the entropy, but only those having -approximately- as many zeroes as ones are typical with respect to the mean).
Second: notice the word "approximately". The concept of typicality is always relative to an $\epsilon$. The quoted sentence, again, is confusing in this regard.
Now, to address your doubt: It's still true that the number of typical sequences (with respect to the entropy!) is approximately $2^{nH(\beta)}$. But your assertion
$$P(nontypical)=\frac{2^n − 2^{nH(\beta)}}{2^n}=1-2^{n(H(\beta)-1)}$$ is wrong. What you are evaluating there is the proportion of non typical sequences, counting them, over the total. That's not a probability, because they are not equiprobable.
Precisely, the essence of the thing (AEP property) is:
Updated: typical sequences are (approximately) equiprobable, indeed, because of their definition: the probability of each one is (approx) $2^{-nH}$. And because we know that their total probability is (approx) 1, this leads to their total count being $2^{nH}$