Are tail bounds on hypergeometric distribution weaker than Chernoff?

370 Views Asked by At

From a previous question, I learned about the tail bounds of the hypergeometric distribution that can be summarized as follows:

Let $N$ be the overall number of balls, let $K$ be the number of red balls, and considering drawing $n$ balls at random without replacement.

Denote the number of drawn red balls by $X$, then $X\sim\mbox{Hypergeometric}(N,K,n)$ and we have that for any $0<t<nK/N$: $$ \Pr[|X-\mathbb E[X]|\ge n\cdot t]\le 2e^{-2t^2\cdot n}. $$

Denote by $p=K/N$ the probability that a specific drawn ball is red, and denote by $Y\sim \mbox{Bin}(n,p)$ a binomial random variable (representing drawing with replacement).

Using the Chernoff bound, we can write, for any $\delta\in(0,1)$: $$ \Pr[|Y-\mathbb E[Y]|\ge np\cdot \delta]\le 2e^{-\delta^2\cdot np/3}. $$

To make the comparison easier, let us set $\delta=t/p$. Then: $$ \Pr[|Y-\mathbb E[Y]|\ge n\cdot t]\le 2e^{-(t/p)^2\cdot np/3}=2e^{-t^2\cdot n/3p}. $$

That is, the exponent bound for $Y$ seems tighter by a $\Theta(1/p)$ factor (which is important when $p$ is small). This seems to contradict the intuition that $X$ "should be" more concentrated around its mean, as was also suggested by the answer to my previous question.

Is $X$ truly more concentrated around its mean?

Can we get a bound of $\Pr[|X-\mathbb E[X]|\ge n\cdot t]\le e^{-\Theta(t^2\cdot n/p)}$?


Edit: This paper claims that binomial tail bounds also apply to hypergeometric random variables. However, it is 10 years old and (as far as I can tell) hasn't been peer reviewed.

1

There are 1 best solutions below

0
On

Hoeffding's paper Probability inequalities for sums of bounded random variables is the first to compare the hypergeometric and binomial tail bounds. Theorem 4 in that paper states that, in the notation of this question, $$\mathbb E[f(X)] \le \mathbb E[f(Y)]$$ for any continuous and convex function $f$.

The proof of the Chernoff bound starts from the following two inequalities:

  • $\Pr[Y \ge a] = \Pr[e^{tY} \ge e^{ta}] \le \frac{\mathbb E[e^{tY}]}{e^{ta}}$ for every $t>0$;
  • $\Pr[Y \le a] = \Pr[e^{-tY} \ge e^{-ta}] \le \frac{\mathbb E[e^{-tY}]}{e^{-ta}}$ for every $t>0$.

If our tail bounds on $Y$ go through these two inequalities, then we are safe, because Hoeffding's theorem tells us that $\mathbb E[e^{tX}] \le \mathbb E[e^{tY}]$ and $\mathbb E[e^{-tX}] \le \mathbb E[e^{-tY}]$. So any bounds for $Y$ that rely on bounding $\mathbb E[e^{\pm tY}]$ also apply to $X$.


For completeness, let's see how to get the Chernoff bound in the question that way. Split up $\mathbb E[e^{tY}]$ up as $\prod_{i=1}^n \mathbb E[e^{tY_i}]$; for each $i$, $e^{tY_i}$ is $e^t$ with probability $p$ and $1$ otherwise, so the expected value is $1 + p(e^t-1) \le e^{p(e^t-1)}$. Therefore $\mathbb E[e^{tY}]/e^{ta} = \exp(np(e^t-1) - ta) = \exp(\mu(e^t-1) - ta)$ where $\mu = np$. We are free to choose $t$, so set $t = \log \frac{a}{\mu}$, getting $\exp(a - \mu - a \log \frac{a}{\mu})$.

When $a = (1+\delta)\mu$, this is $\exp(\delta \mu - (1+\delta) \mu \log(1+\delta))$. To understand this better, look at the Taylor series for $\delta - (1+\delta) \log(1+\delta)$: it is $-\frac12 \delta^2 + \frac16 \delta^3 - \frac1{12}\delta^4 + O(\delta^5)$. When $\delta$ is close to $0$, we should be getting bounds on the order of $\exp(-\frac12 \delta^2 \mu)$, but for $0 < \delta < 1$, the best we can say is that $\frac{\delta - (1+\delta) \log(1+\delta)}{\delta^2} < -\frac13$, giving us an upper bound of $\exp(-\frac13 \delta^2 \mu)$. (Actually, the $\frac13$ could be $2\log 2 - 1 \approx 0.386$, but who's counting.)

Doing the same thing for the lower tail, the negative sign actually doesn't matter until we get to the upper bound of $\exp(a - \mu - a \log \frac{a}{\mu})$ as before, but now we are looking at $a = (1-\delta)\mu$. The exponent is $\delta^2 \mu$ times the factor $\frac{(\delta -1) \log (1-\delta )-\delta }{\delta^2}$, which starts off at $-\frac12$ and only keeps decreasing for $\delta>0$.

So when we write $\Pr[|Y -\mu| \ge \delta \mu] \le 2 e^{-\frac13\delta^2\mu}$ for $0<\delta<1$, we are actually bounding the well-behaved lower tail by the poorly behaved upper tail to get a simpler expression; we could do better and say $\Pr[Y \le (1-\delta)\mu] \le e^{-\frac12 \delta^2\mu}$ for all $\delta>0$.

In any case, Hoeffding's result tells us that we immediately get $\Pr[|X-\mu| \ge \delta \mu]\le 2 e^{-\frac13\delta^2\mu}$ and $\Pr[X\le (1-\delta)\mu] \le e^{-\frac12\delta^2\mu}$ in all the same cases for all the same reasons