Bernstein inequality for non-centered random variables (Is there a counterexample?)

1.7k Views Asked by At

The usual Bernstein inequality (see e.g. Rauhut + Fourcat, A Mathematical Introduction to Compressive Sensing, Theorem 7.30) states that if $X_1, \dots, X_m$ are independent mean zero random variables with $$ \Bbb{E}|X_\ell|^n \leq \frac{n!}{2}\cdot R^{n-2} \cdot \sigma_\ell^2 \qquad \qquad (\ddagger) $$ for all $n \geq 2$ and $1 \leq \ell \leq m$, then for all $t>0$, we have $$ \Bbb{P}\bigg(\bigg|\sum_{\ell = 1}^M X_\ell \bigg| \geq t\bigg) \leq 2\cdot \exp\bigg(-\frac{t^2 /2}{\sigma^2 + Rt}\bigg) $$ with $\sigma^2 = \sum_\ell \sigma_\ell^2$.

In another paper I have found (without proof and without indication of a source) that even if we do not assume the $X_i$ to have mean zero, we still get $$ \Bbb{P}\bigg(\bigg|\sum_{\ell = 1}^M (X_\ell - \Bbb{E}X_\ell )\bigg| \geq t\bigg) \leq 2\cdot \exp\bigg(-\frac{t^2 /2}{\sigma^2 + Rt}\bigg). \qquad \qquad (\dagger) $$

First of all, I noted that one could try to derive this new claim from the original one by considering the mean zero random variables $Y_\ell := X_\ell - \Bbb{E}(X_\ell)$. But the problem here is that the best estimate I was able to show is \begin{eqnarray*} \Bbb{E}|Y_\ell|^n &\leq& \Bbb{E}( |X_\ell| + \Bbb{E}|X_\ell|)^n \\ &\leq& 2^{n-1} \cdot \Bbb{E}[|X_\ell|^n + (\Bbb{E}|X_\ell|)^n ] \\ &\leq& 2^n \cdot \Bbb{E}|X_\ell|^n \\ & \leq & 2^n \cdot \frac{n!}{2} \cdot R^{n-2} \cdot \sigma_\ell^2\\ &=& \frac{n!}{2} \cdot (2R)^{n-2} \cdot (2\sigma_\ell)^2, \end{eqnarray*} where I used Jensen's inequality to get $(\Bbb{E}|X_\ell|)^n \leq \Bbb{E}|X_\ell|^n$.

But with these bounds for $Y_\ell$ (which are worse than those for $X_\ell$), we only get $$ \Bbb{P}\bigg(\bigg|\sum_{\ell = 1}^M (X_\ell - \Bbb{E}X_\ell )\bigg| \geq t\bigg) \leq 2\cdot \exp\bigg(-\frac{t^2 /2}{4 \sigma^2 + 2Rt}\bigg) \leq 2\cdot \exp\bigg(-\frac{t^2 /8}{\sigma^2 + Rt}\bigg). $$

Summing up, my question is the following: Even though I am not able to deduce the non-centered result from the centered one, is the non-centered result (as in $(\dagger)$) still true, or can one find a counterexample?

My search for a counterexample so far has not been very fruitful. What I tried to do is to find a counterexample for the case $m=1$, i.e. only one random variable, in particular I tried $X$ which only assume values $M$ and $-1$, the value $M$ with low probability $p> 0$, so that the expected value is negative. This will then cause $X - \Bbb{E}X$ to attain even larger values than $X$ with probability $p>0$. But this construction did not produce a counterexample, at least for the choices of $M,p$ that I tried.

1

There are 1 best solutions below

1
On BEST ANSWER

As suggested by @Dominik, we can adopt the usual proof to get the inequality even for non-centered random variables. The point is that the expectation "magically" cancels out.

We assume (as before) that $$ \mathbb{E}\left|X_{\ell}\right|^{n}\leq\frac{n!}{2}\cdot R^{n-2}\cdot\sigma_{\ell}^{2} $$ for all $1\leq\ell\leq m$ and $n\geq2$.

In one direction, we want to estimate for $t>0$ the quantitiy \begin{align*} \mathbb{P}\left(\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)\geq t\right) & =\mathbb{P}\left(e^{\theta\cdot\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)}\geq e^{\theta t}\right)\\ & \leq e^{-\theta t}\cdot\mathbb{E}\left[e^{\theta\cdot\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)}\right]\\ & =e^{-\theta t}\cdot e^{-\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot\mathbb{E}\left[e^{\theta\sum_{\ell=1}^{m}X_{\ell}}\right]\\ & =e^{-\theta t}\cdot e^{-\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot\prod_{\ell=1}^{m}\mathbb{E}e^{\theta X_{\ell}}, \end{align*} where the last step used indipendence of the $\left(X_{1},\dots,X_{m}\right)$. Now, we use our assumption to bound the moment generating function $\mathbb{E}\left[e^{\theta X_{\ell}}\right]$ as follows: \begin{align*} \mathbb{E}e^{\theta X_{\ell}} & =\sum_{n=0}^{\infty}\frac{\mathbb{E}\left(\theta X_{\ell}\right)^{n}}{n!}\\ & =1+\theta\mathbb{E}X_{\ell}+\sum_{n=2}^{\infty}\frac{\theta^{n}}{n!}\cdot\mathbb{E}X_{\ell}^{n}\\ & \leq1+\theta\mathbb{E}X_{\ell}+\sum_{n=2}^{\infty}\frac{\theta^{n}}{n!}\cdot\mathbb{E}\left|X_{\ell}\right|^{n}\\ & \leq1+\theta\mathbb{E}X_{\ell}+\sum_{n=2}^{\infty}\frac{\theta^{n}}{n!}\cdot\frac{n!}{2}R^{n-2}\sigma_{\ell}^{2}\\ & =1+\theta\mathbb{E}X_{\ell}+\frac{\sigma_{\ell}^{2}\theta^{2}}{2}\cdot\sum_{n=2}^{\infty}\left(\theta R\right)^{n-2}\\ & =1+\theta\mathbb{E}X_{\ell}+\frac{\sigma_{\ell}^{2}\theta^{2}}{2}\cdot\frac{1}{1-\theta R}\\ & \leq e^{\theta\mathbb{E}X_{\ell}+\frac{\sigma_{\ell}^{2}\theta^{2}}{2}\cdot\frac{1}{1-\theta R}}. \end{align*} In the last step, we used the well-known inequality $1+x\leq e^{x}$ which follows (e.g.) by convexity of $e^{x}$. Furthermore, we assumed throughout that $\theta R<1$.

Continuing as above, we get \begin{align*} \mathbb{P}\left(\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)\geq t\right) & \leq e^{-\theta t}\cdot e^{-\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot\prod_{\ell=1}^{m}\mathbb{E}e^{\theta X_{\ell}}\\ & =e^{-\theta t}\cdot e^{-\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot\prod_{\ell=1}^{m}e^{\theta\mathbb{E}X_{\ell}+\frac{\sigma_{\ell}^{2}\theta^{2}}{2}\cdot\frac{1}{1-\theta R}}\\ & =e^{-\theta t}\cdot e^{-\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot e^{\theta\sum_{\ell=1}^{m}\mathbb{E}X_{\ell}}\cdot e^{\frac{\theta^{2}}{2}\frac{1}{1-\theta R}\cdot\sum_{\ell=1}^{m}\sigma_{\ell}^{2}}\\ & =e^{\frac{\theta^{2}\sigma^{2}}{2\left(1-\theta R\right)}-\theta t}. \end{align*} Note that (magically) the expected value of $X_{\ell}$ canceled. Now, we can choose (just as in the usual proof) $\theta=\frac{t}{\sigma^{2}+Rt}$ (which fulfills $\theta R=\frac{Rt}{\sigma^{2}+Rt}<\frac{Rt}{Rt}=1$) and we get $$ \mathbb{P}\left(\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)\geq t\right)\leq e^{-\frac{t^{2}/2}{\sigma^{2}+Rt}} $$ as desired.

Finally, we can replace $X_{\ell}$ by $Y_{\ell}:=-X_{\ell}$ (which still satisfies the same assumptions) and get $$ \mathbb{P}\left(\sum_{\ell=1}^{m}\left(X_{\ell}-\mathbb{E}X_{\ell}\right)\leq-t\right)=\mathbb{P}\left(\sum_{\ell=1}^{m}\left(Y_{\ell}-\mathbb{E}Y_{\ell}\right)\geq t\right)\leq e^{-\frac{t^{2}/2}{\sigma^{2}+Rt}}. $$ The union bound completes the proof.