Correct measure in concentration inequalities or hypothesis testing

140 Views Asked by At

In most discussions of concentration inequalities or calculations of rejection region in hypothesis testing, the measure used is left vague. For example, for independent random variables $X_1, \ldots, X_n$ satisfying $0 \le X_i \le 1$, Hoeffding's inequality is usually stated (Wikipedia link) as

$$\text{P}\left(\overline{X} - \mathbb{E} \overline{X} \ge t\right) \le e^{-2nt^2} \tag{1}$$

where $\overline{X} = \frac{1}{n}(X_1 + \cdots + X_n)$ and $t \ge 0$.

To make this concrete, suppose $(\Omega, \mathcal{F}, \mathbb{P})$ is the underlying probability space. Then do we treat $\overline{X}$ as a function on $\Omega$ or on $\Omega^n$, because that determines if "$P$" in $(1)$ is $\mathbb{P}$ or $\mathbb{P}^{\otimes n}$ (product measure on $(\Omega^n, \mathcal{F}^{\otimes n})$. I think $\overline{X}$ should be a function on $\Omega$ looking at the proof of the inequality.

But now if you consider the general McDiarmid's inequality which states (or check Ledoux's book The Concentration of Measure Phenomenon): Let $(K, \mathcal{A}, \mu)$ be a probability space, let $L > 0$ be a constant, and let $f \colon K^n \to \mathbb{R}$ be a measurable function (for the product $\sigma-$algebra $\mathcal{A}^{\otimes n}$) which is $L-$Lipschitz for the normalized Hamming metric (i.e., $d(x,y) = \frac{1}{n}\left|\{i=1,\ldots,n : x_i \neq y_i\}\right|$ for $x,y \in \Omega^n$), then

$$\mu^{\otimes n} \left\{ f - \int_{K^n} f \,\mathrm{d}\mu^{\otimes n}\ge t \right\} \le e^{-2nt^2/L^2}$$

Let $K = [0,1]$, let $X_1, \ldots, X_n$ be i.i.d random variables (mapping $\Omega$ to $K$) with distribution $\mu$ (i.e., $\mu = \mathbb{P} \circ X_i^{-1}$), and let $f(x_1, \ldots, x_n) = \frac{1}{n}(x_1 + \cdots + x_n)$. Then clearly $f$ is $1-$Lipschitz. Denote by $X$ the function $(X_1, \ldots, X_n) \colon \Omega^n \to K^n$. Change of variables implies

$$\int_{K^n} f \, \mathrm{d}\mu^{\otimes n} = \int_{\Omega^n} f(X) \, \mathrm{d} \mathbb{P}^{\otimes n} = \int_\Omega X_1 \, \mathrm{d}\mathbb{P} = \mathbb{E}(X_1)$$ and we can write $$ f(X) = \overline{X} $$

Therefore, change of variables again and McDiarmid's inequality implies

$$\mathbb{P}^{\otimes n} \left\{ \overline{X} - \mathbb{E}(X_1) \ge t \right\} \le e^{-2nt^2} \tag{2}$$

And now compare equations $(1)$ and $(2)$. Why the discrepancy in measure?