Information Theoretic View of Stein's Lemma

119 Views Asked by At

Suppose $P$ and $Q$ are two probability distributions on a, say finite alphabet set $\mathbb{X}$. Suppose that a Statistician has to decide between $P$ and $Q$ based on an i.i.d. sample $X_1,X_2,\dots, X_k =:X^k\in\mathbb{X}^k$. The decision is based on a set $A\subset\mathbb{X}^k$ (decision region) such that $P$ is accepted if $X^k\in A$, else $Q$ is accepted.

In this set-up, what we are interested in is (the probability of error),

$$\beta(k,\epsilon) := \min_{A\subseteq \mathbb{X}^k:P^k(A)\ge 1-\epsilon}Q^k(A),$$ where $P^k(x^k):=\prod_{i=1}^{k}P(x_i)$ and $Q^k(x^k):=\prod_{i=1}^{k}Q(x_i)$.

Stein's lemma says that \begin{equation} \tag{*}\label{*} \lim_{k\to\infty}\frac{1}{k}\log \beta(k,\epsilon) = -D(P\|Q), \end{equation} where $$D(P\|Q):=\sum_{x\in\mathbb{X}}P(x)\log\frac{P(x)}{Q(x)}.$$ Interestingly, while proving the above result, what we make use of is the following set (also called a $\delta$-typical set in information theory), which also looks practically feasible to handle with: \begin{equation} B(k,\delta) := \left\{x^k: -D(P\|Q)-\delta\le\frac{1}{k}\log\frac{P^k(x^k)}{Q^k(x^k)}\le -D(P\|Q)+\delta\right\}. \end{equation} We make use of the above set to achieve the limit in \eqref{*}. My question is, is this the case in statistics too? That is, do we use the set $B(k,\delta)$ as the decision region in statistics too? If so, what $\delta$ one should choose? Any help is greatly appreciated.

PS: I asked the same question in stat.stackexchange where it didn't seem to attract many users. So I'm asking here.