I just learned about Chernoff Bounds and am wondering if one can prove the Asymptotic Equipartition Property using them instead of the Weak Law of Large Numbers (which is the consequence of the Chebyshev Inequality). I started with a what I thought was a simple case: sequence of i.i.d. standard Gaussian random variables $\{X_i\}$ where $X_i\sim\mathcal{N}(0,\sigma^2)$. We are interested in proving:
$$P\left(\left|-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)-H(X)\right|\geq\epsilon\right)\rightarrow 0$$
First step in applying Chernoff Bound is finding the Moment Generating Function. However, I ran into trouble immediately:
$$\begin{array}{rcl}E\left[\exp(-t\log(\phi(x))\right]&=&\int e^{-t\log(\phi(x))}\phi(x)dx\\ &=&\int\exp\left(-t\log\left(\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/\sigma^2}\right)\right)\phi(x)dx\\ &=&\int\exp\left(t\left(\log\left(\sqrt{2\pi}\sigma\right)+x^2/\sigma^2\right)\right)\phi(x)dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}\exp\left(tx^2/\sigma^2\right)}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}}{\sqrt{2\pi}\sigma}\exp\left(\frac{-x^2(1-t)}{2\sigma^2}\right)dx\\ &=&? \end{array}$$ where $\phi(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}$. I am stuck because $t\in\mathbb{R}$ and thus can be greater than one...
What did I do wrong? Is it even possible to prove AEP using Chernoff Bounds?
Let me abstract a little bit the situation and show that the inequality you have in mind stems from a rather general result (as mentioned in the comments, the following should be explained in every introductory text on large deviations principles). Consider some integrable random variables $\xi$ and $(\xi_n)_{n\geqslant1}$ and, for some $n\geqslant1$ and some positive $x$, the event
Using first $\xi_i=\log\phi(X_i)$ and then $\xi_i=-\log\phi(X_i)$, the event you consider is the union of two events $A_n$, hence our task is to bound $\mathrm P(A_n)$.
Now comes an additional, non trivial, hypothesis: assume that $\xi$ is exponentially integrable, that is:
Then the proof is simple. Start from the almost sure inequality, valid for every nonnegative $t$: $$ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathbf 1_{A_n}\leqslant\mathrm e^{t(\xi_1+\cdots+\xi_n)}=\mathrm e^{t\xi_1}\cdots\mathrm e^{t\xi_n}. $$ Integrate both sides and use the independence assumption. This yields $$ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathrm P(A_n)\leqslant\mathrm E(\mathrm e^{t\xi})^n. $$ We proved that, for every nonnegative $t$, $$ \mathrm P(A_n)\leqslant u(t)^n\qquad\text{where}\qquad u(t)=\mathrm E(\mathrm e^{t\xi})\mathrm e^{-t(\mathrm E(\xi)+x)}. $$ Note that a lot of these inequalities are just useless. For example, $u(0)=1$ hence the $t=0$ version is $\mathrm P(A_n)\leqslant 1$... Big deal! Even worse: if $t\gt0$ is such that $u(t)\gt1$, or such that $\mathrm E(\mathrm e^{t\xi})$ is infinite, we painfully proved that $\mathrm P(A_n)$ is bounded by something greater than $1$, or infinite. :-)
On the other hand, if there exists at least some nonnegative $t$ such that $u(t)\lt1$, the inequality becomes a strong control on $\mathrm P(A_n)$, namely, that $\mathrm P(A_n)\to0$ at least as fast as a geometric sequence. Hence the key question becomes:
To answer this, let us consider the limit $t\to0$ with $t\gt0$. Then, $$ \mathrm E(\mathrm e^{t\xi})=1+t\mathrm E(\xi)+o(t),\qquad \mathrm e^{-t(\mathrm E(\xi)+x)}=1-t(\mathrm E(\xi)+x)+o(t), $$ hence $u(t)=1-tx+o(t)$. Since $x\gt0$, this limited expansion implies that $u(t)\lt1$ for (at least) some (small) $t\gt0$. The proof is over.
To sum up, the result stems from a Markov-type inequality plus a limited expansion of the Laplace transform $t\mapsto\mathrm E(\mathrm e^{t\xi})$ at $t=0$. The Markov-type inequality is powerful because the $\xi_n$ are independent. The limited expansion is valid because we assumed $(\ast)$.
Note that the proof above uses the finiteness of $\mathrm E(\mathrm e^{t\xi})$ for $0\leqslant t\leqslant t_0$ only. The finiteness for $-t_0\leqslant t\leqslant 0$ is needed when dealing with the event $[\xi_1+\cdots+\xi_n\leqslant n\mathrm E(\xi)-nx]$, hence the control $(\ast)$ must indeed be two-sided.
Coming back at last to the question you asked, if $\xi=\log\phi(X)$ with $X$ centered normal of variance $\sigma^2$ and $\phi$ the density of the distribution of $X$, then $\mathrm E(\mathrm e^{t\xi})$ is finite if and only if $\mathrm E(\phi(X)^{t})$ is finite if and only if $\mathrm E(\mathrm e^{-tX^2/(2\sigma^2)})$ is finite if and only if $t+1\gt0$, hence $(\ast)$ holds with $t_0=1$.