I am trying to follow Terence Tao's notes on concentration of measure, particularly the derivation of equation $(8)$ . Suppose $\{X_i\}$ is a collection of random variables normalised to have mean zero, variance at most $1$ and are almost surely bounded by $M$. Set $S_n = \sum_{i = 1}^n X_i$. It can be shown that under the assumption that the $X_i$ are $k$-wise independent for some even positive integer $k$ then under the additional assumption that $M^2 \leq \frac{n}{k}$ that $$E|S_n|^k \leq 2(enk/2)^{k/2}$$ so Markov's inequality implies the large deviation inequality
$$P\left(|S_n| \geq \lambda\sqrt{n}\right) \leq 2 \left(\frac{\sqrt{ek/2}}{\lambda}\right)^k \tag{1}$$
If instead of just $k$-wise independence we assume joint independence Professor Tao then states that equation $(1)$ can be applied for any $k$ such that $M^2 < n/k$ and that we can "optimise" in $k$ by setting $\sqrt{nk}$ to be a small multiple of $\lambda$ to conclude $$P\left(|S_n| \geq \lambda \sqrt{n}\right) \leq C \exp(-c\lambda^2) \tag{2}$$ for some absolute constants $C, c > 0$, provided $|\lambda| \leq c \sqrt{n}/\sqrt{M} $ for some small c
I am unable to deduce equation $(2)$ from equation $(1)$ as I am unsure how to convert bounds for every $k$ that only hold for $n \geq M^2k$ to the exponential form above. I am also confused by what it means to "optimize" in $k$ and why $(2)$ holds on the region $|\lambda| \leq c \sqrt{n}/\sqrt{M}$. I am hoping someone can explain how to derive $(2)$ or at least give a hint that would make it easy to see why it holds.
Edit: In the errata of the notes it is clarified that $\sqrt{k}$ should be set to a small multiple of $\lambda$ rather than $\sqrt{nk}$. In that case setting $\lambda = c \sqrt{k} $ then for $\lambda \leq c \frac{\sqrt{n} }{M }$ we get \begin{align} P\left( |S_n| \geq \lambda \sqrt{n} \right) & \leq 2 * \left( \frac{e }{2c^2} \right)^{\frac{\lambda^2}{2c^2} }\\ &= 2* e^{\frac{\lambda^2}{2c^2}\log\left( \frac{e}{2c^2} \right)} \end{align} For $2 c^2 > e$ we get that the RHS is bounded above by $2 * e^{- \tilde{c}\lambda^2}$ where $\tilde{c} = |\log(\frac{e}{2c^2})|/2c^2$. However, I'm still confused about a couple of points
- This requires $c$ to be "large enough" rather than "small enough"
- (2) only holds for integer $k$ and so I'm not sure how to pick $c, C$ to hold independently of $\lambda, n$
Alright, so $k$ is a free parameter (if you assume joint independence, you can pick any $k$ you want). So "optimizing over $k$" means "picking the value of $k$ that minimizes the right-hand-side (i.e., gives the best bound)".
One standard way to do that is to treat $k$ (discrete parameter) as a continuous variable, and minimize the function $f$ given by $$ f(x) = 2 \left(\frac{\sqrt{ex/2}}{\lambda}\right)^x $$ for $x\geq 0$ such that $M^2 \leq \frac{n}{x}$, by, e.g., differentiation, etc.
That last condition will impact the result, and givesome constraint relating $\lambda, M, n$ in the final bound.