Reading Network Information Theory, Gamal. It's a bit terse at times and I'm trying to get an intuition for conditional typicality. The conditional typicality lemma states
Let $(X,Y) \ \tilde{} \ p(x,y)$. Suppose $x^n \in \mathcal{T}_{\epsilon'}^n(X)$ and $Y^n \tilde{} \ p(y^n|x^n) = \Pi p_{Y|X}(y_i|x_i)$. Then, for all $\epsilon > \epsilon'$ we have
$\lim_{n \rightarrow \infty} P\{(x^n,Y^n) \in \mathcal{T}_\epsilon^n(X,Y)\} = 1$
I'm not sure how to interpret this. An attempt: Suppose $X^n=x^n$ is given and $x^n$ is $\epsilon'$-typical and $Y_i$ is somehow dependent on $X_i$. Then it is very likely $(x^n,y^n)$ will be typical for $(X,Y)$, but it will probably not be "quite as" typical as $x^n$ since $\epsilon' > \epsilon$. I.e. if you have some correlated sequences and you know one of the sequences are typical in some sense, you can expect the pair of sequences to also be typical, but in some weaker sense.
Is this the right interpretation?
You might want to read Cover & Thomas' Elements of Information Theory 2e first, particularly the channel capacity chapter.
Basically, the set of length $n$ sequences which are $\epsilon$-jointly typical are those which have empirical entropies (i.e. $- \frac{\log p(x^n,y^n)}{n}$) are within $\epsilon$ of the true entropy $H(X,Y)$, and $x^n$ is $\epsilon$-typical as is $y^n$. So, there are roughly $2^{n H(X,Y)}$ jointly typical sequences where as there are $2^{n H(X)}$ typical sequences $x^n$ and $2^{n H(Y)}$ typical sequences $y^n$. So, a random pair of sequences $x^n$ and $y^n$ is jointly typical with probability about $2^{-n I(X;Y)}$.
You can move this to the conditional idea from the joint idea in the usual way -- fix an output sequence $y^n$ and look at the input sequences $x^n$.