I am new to machine learning theory and recently read Lamma A.8 in P363 of the book Prediction, Learning, and Games:
Bernstein’s inequality for martingales: Let $X_1, X_2, \cdots, X_n$ be a bounded martingale difference sequence with respect to the filtration $\mathcal{F}=(\mathcal{F}_i)_{1\le i\le n}$ and $|X_i|\le K$. Let $$ S_i=\sum_{j=1}^i X_j $$ be the associated martingale. Denote the sum of the conditional variances by $$ \Sigma_n^2 = \sum_{t=1}^n \mathbb{E}[X_t^2|\mathcal{F}_{t-1}] $$ Then for all constant $t,v \gt 0$, $$ \mathbb{P}\Big[\max_{i=1,\cdots,n} S_i \gt t,\Sigma_n^2 \le v\Big] \le \exp\bigg(-\frac{t^2}{2(v+Kt/3)}\bigg) $$
The book omits the proof but refers to another paper on the tail probabilities for martingales. After checking, however, I still have little clue about constructing its full proof yet.
Many thanks for any kind of help!