Problem setting: There are $R$ red balls and $B$ blue balls along with $R$ red buckets and $B$ blue buckets. Consider the case when $R \ll B$
Event: $R+B$ balls are arranged in $R+B$ buckets such that each bucket contains one ball, and each of the $(R+B)!$ combinations occur with equal probability.
Quantity of interest: Let $X$ be a random variable which denotes the number of red balls in red buckets. How do we bound the following? $$\mathbb{P}\big( X - \mathbb{E}[X] > t \big)$$
Salient points
Using combinatorics it is easy to show that $$\mathbb{P}\big(X=x\big)= \binom{k}{x}\binom{n-k}{k-x}\Big/\binom{n}{k}$$
If we naively (and wrongly) assume that each red ball falling in a red bucket is an independent event, we have $X\sim Binom(\frac{R}{R+B},R)$ and using Chernoff's bound gives us $$\mathbb{P}\big( X - \mathbb{E}[X] > t \big)\leq \exp-\bigg\{\frac{t^2}{2\mathbb{E}[X]+t}\bigg\}.$$ However, it turns out that we can derive the above result using Stein's method of exchangeable pairs.
My question: For $R\ll B$, I suspect the above concentration can be strengthened to
$$\mathbb{P}\big( X - \mathbb{E}[X] > t \big)\leq \exp-\bigg\{\frac{t^2}{2\mathbb{E}[X]}\bigg\}.$$ Why do I think so?
- Wrongly assuming $X\sim Pois(\mathbb{E}[X])$ or $X\sim \mathcal{N}\big(\mathbb{E}[X],Var(X)\big)$ will give us the above result.
- Simulations also seem to support my claim.