My lecturer went through the topic of "Jointly Typical Sequences" in my Information Theory course, and one of the properties/lemma was that $P((X^n, Y^n) \in A^{(n)}_e) \to 1$ as $n \to \infty$, where $A^{(n)}_e$ is the typical set.
He went on to prove that $P((X^n, Y^n) \notin A^ {(n)}_\epsilon) \le P(E_x) + P(E_y) + P(E_{xy})$, where
$E_x = {|-\frac{1}{n}logP(x^n) - H(X)| > \epsilon},$
$E_y = {|-\frac{1}{n}logP(y^n) - H(Y)| > \epsilon},$
$E_{xy} = {|-\frac{1}{n}logP(x^n y^n) - H(XY)| > \epsilon}$
By Chebyshev's Inequality, $P(E_x) \to 0$, $P(E_y) \to 0$ and $P(E_{xy}) \to 0$, hence, completing the proof.
I don't understand why $P(E_x) \to 0$, $P(E_y) \to 0$ and $P(E_{xy}) \to 0$ using Chebyshev's Inequality. I know that Chebyshev's Inequality states that
$P(|X - E(X)| > \epsilon) <= Var(X)/\epsilon^2$
I could take the log probabilities as $X$ and the entropies as $E[X]$ in the Chebyshev's Inequality. However, I would need the variance to be zero for $P(E_x) \to 0$, $P(E_y) \to 0$ and $P(E_{xy}) \to 0$. This is what I don't get - where does the variance come from and why does it go to zero? Or is there something else I'm missing?
The statement follows by observing that $X$ in you case is an average value. If you assume (I guess you did that) that the sequence is independent and identically distributed, then
\begin{align} X&=-\frac{1}{n}\log P(x^n)\\ &=-\frac{1}{n}\log \left( \prod_{i=1}^n p(x_i)\right)\\ &= -\frac{1}{n}\sum_{i=1}^n \log p(x_i) \end{align}
which is the average of $n$ independently and identically distributed random variables. Hence, the variance of $X$ scales with $1/n$.