Confusion on Extremes of Chernoff-Hoeffding Bound

95 Views Asked by At

For $X$ a binomial random variable with probability of success $p$, the Chernoff-Hoeffding bound states \begin{equation} \mathrm{Pr}[X \geq (p+t)n] \leq e^{-n\left((p+t)\log(\frac{p+t}{p})+(1-p-t)\log(\frac{1-p-t}{1-p})\right)} = e^{-nD(\{p+t,1-p-t\}||\{p,1-p\})} \end{equation} where $n$ is the number of trials, $D(p||q)$ is the Kullbeck-Leibner divergence, and $t$ is a free-parameter which is used to specify the bounded value.

Here's where my confusion begins: consider $\mathrm{Pr}[X\geq 0]$. In this case, \begin{equation} (p+t)n=0 \iff t=-p \end{equation} so the Chernoff Bound states \begin{equation} \mathrm{Pr}[X\geq 0] \leq e^{-nD(\{0,1\},\{p,1-p\})} = e^{-n\log(\frac{1}{1-p})} = (1-p)^n < 1 \end{equation} which is not true, because $X\geq 0$ encompasses ALL possible values of $X$, so it must be equal to 1.

Where is my error? I've detailed my understanding of the derivation below:

Consider random, independent trials each with probability of success $p \in [0,1]$. If $X$ is the number of successes attained in $n$ trials, then $X$ follows a binomial distribution: $$ \mathrm{Pr}[{X = i}] = {n\choose i}p^i (1-p)^{n-i} $$ The Chernoff bound allows us to bound the cumulative probability $\mathrm{Pr}[X\geq m]$. Note that $X = \sum_{i=1}^n X_i$ where $X_i=1$ iff trial $i$ succeeds and $X_i=0$ otherwise. Then \begin{equation} \mathrm{Pr}[X\geq m] = \mathrm{Pr}[e^{\lambda X} \geq e^{\lambda m}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda m}} \end{equation} where the inequality follows from Markov's inequality. Further, by the independence: \begin{equation} \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda m}} = \frac{\prod_i \mathbb{E}[e^{\lambda X_i}]}{e^{\lambda m}} = \frac{(pe^\lambda + (1-p))^n}{e^{\lambda m}} \end{equation} Using parameterization $m=(p+t)n$ and minimizing over all $\lambda > 0$ yields \begin{equation} \mathrm{Pr}[X \geq (p+t)n] \leq e^{-n\left((p+t)\log(\frac{p+t}{p})+(1-p-t)\log(\frac{1-p-t}{1-p})\right)} = e^{-nD(\{p+t,1-p-t\}||\{p,1-p\})} \end{equation} where $D(p||q)$ is the Kullbeck-Leibner divergence.

It seems like the only room for error in the derivation would be Markov's inequality: $$ \mathrm{Pr}[X \geq a] \leq \frac{\mathbb{E}[X]}{a} $$ But Markov's inequality holds for all $a>0$, and in this case my value $e^{\lambda m} >0$ for all values of $m,\lambda>0$.

2

There are 2 best solutions below

0
On BEST ANSWER

This is a nice question, and you got me confused at first. Th problem is that your first inequality only holds for $t \ge 0$.

The correct Chernoff Entropy bound is the following: \begin{align} \mathrm{Pr}[X \geq (p+t)n] \leq e^{-nD(\{p+t,1-p-t\}|\{p,1-p\})}&\quad\text{if }t\ge 0\\ \mathrm{Pr}[X \leq (p+t)n] \leq e^{-nD(\{p+t,1-p-t\}|\{p,1-p\})}&\quad\text{if }t\le 0 \end{align} So you see, that for $t=-p$ we get \begin{align} \mathrm{Pr}[X \leq 0] \leq (1-p)^n, \end{align} which is in fact an equality in our case.

The reason for the two bounds is that when you optimize $\lambda$ you set it equal to $\log\left(\frac{m(1-p)}{(n-m)p}\right)$ which is equal to $0$ at $m=pn$ and less than zero for $m<pn$.

1
On

You are choosing $m = 0$. Thus, in the first step you are stating that for any $$ P[X \geq 0] = P[\lambda X \geq 0] = P[e^{\lambda X} \geq 1] \le E[e^{\lambda X}] = (pe^{\lambda} + (1 - p))^n, $$ for any $\lambda > 0$. But then $e^{\lambda} > 1$ implies that $$ pe^{\lambda} + (1 - p) > 1, $$ and the bound is useless.