Question Regarding Vershynin's Proof of Bernstein's Inequality

203 Views Asked by At

I have been studying Vershynin's "High-dimensional Probability," and I have some confusion regarding the proof of Bernstein's inequality (Thm 2.8.2). It concerns the following step: (Perhaps note that $(X_i)_i$ is a finite sequence of independent, mean zero subexponential random variables)

By a property of subexponential random variables, we can bound their MGF the following way for $|\lambda| \leq \frac{c}{\max \|X_i\|_{\phi_1}}$:

$$\mathbb{E}[\exp (\lambda X_i)] \leq \exp (C \lambda^2 \|X_i\|_{\phi_1}^2).$$

Thus we get (from a previous step, note $S := \sum X_i$) $$ \mathbb{P}(S \geq t) \leq \exp (-\lambda t + C \lambda^2 \sigma^2) \quad (*)$$ where $\sigma^2 = \sum_{i=1}^N \|X_i\|_{\phi_1}^2$. Now we minimize this expression in $\lambda$ with respect to the constraint and get an optimal choice of

$$\lambda = \min \left(\frac{t}{2C\sigma^2}, \frac{c}{\max_i \|X_i\|_{\phi_1}}\right).$$

Thus, with this we obtain $$ \mathbb{P}(S \geq t) \leq \exp \left(-\min \left(\frac{t^2}{4C\sigma^2}, \frac{ct}{2 \max_i \|X_i\|_{\phi_1}}\right)\right).$$

Now, my issue is with this last step. I get how we get the first term in the minimum simply by plugging in the first possible value of $\lambda$ into (*), and I would think to get the second one I just have to plug in the second possible value, but I don't see how I then get the second value... am I missing something?


If you want to check the source material, the book is available online for free under

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf

2

There are 2 best solutions below

0
On BEST ANSWER

After talking this through with a friend, I think I have come up with an answer. Note that we can rewrite $$ -\lambda t + C \lambda^2 \sigma^2 = -\lambda(t - C \lambda \sigma^2)$$ and $$ t - C \lambda \sigma^2 = t - \min \left(\frac{t}{2}, \frac{cC\sigma^2}{\max \|X_i\|_{\phi_1}} \right) $$ when plugging in the optimal value for $\lambda$. Now, we can observe that the minimum above is less than or equal to $\frac{t}{2}$, and so $$ \exp (-\lambda t + C \lambda^2 \sigma^2) \leq \exp (-\lambda \frac{t}{2}). \quad (**)$$ So, using this one gets the inequality above by now simply plugging in $\lambda$ in $(**)$.

0
On

Note that $$\frac{t}{2 C \sigma^2} > \frac{c}{\max_i \Vert X_i \Vert_{\psi_1}} \iff \sigma^2 < \frac{t \max_i \Vert X_i \Vert_{\psi_1}}{2 cC}.$$ So if the optimal $\lambda = c / \max_i \Vert X_i \Vert_{\psi_1}$, we can use the above inequality for $\sigma^2$ in your equation $(*)$; simplifying yields the second case of the claimed probability bound as stated.