High-probability bound on # of times empirical mean fails to concentrate

41 Views Asked by At

Previous result

The book "Bandit Algorithms" by Lattimore and Szepesvári includes the following result (Lemma 8.2, page 98), which I have simplified and paraphrased for my use case:

Let $X_1, \dots, X_n$ be a sequence of independent $1$-subgaussian random variables and $\epsilon > 0$. Denote the empirical mean of the $X_s$ by $$\hat \mu_t = \frac 1 t \sum_{s=1}^t X_s$$ and the $\epsilon$-overestimate count by $$ \kappa_n = \sum_{t=1}^n \mathbb{I} \left\{ \hat \mu_t \geq \epsilon \right\}, $$ where $\mathbb{I}(\cdot)$ denotes an indicator variable. Then $$ \mathbb{E}[\kappa_n] \leq \frac{2}{\epsilon^2}.$$

Desired variant

I need to control the probability of large $\kappa_n$ instead of $\kappa_n$'s expected value. In other words, I need a result like:

Let $0 < \delta < 1$ be a confidence level. There is a function $f$ such that $$ \Pr \left\{ \kappa_n \geq f \left( \frac 1 \epsilon, \frac 1 \delta, n \right) \right\} \leq \delta. $$ Ideally $f$ would not depend on $n$, but I am not sure if that's possible.

Proof efforts

The expected-value version from Lattimore and Szepesvári depends heavily on linearity of expectation: The first step is that

$$ \mathbb{E}[\kappa_n] = \sum_{t=1}^n \Pr \left\{ \hat \mu_t \geq \epsilon \right\}, $$

which immediately avoids dealing with the fact that the tail events $\hat \mu_t \geq \epsilon$ and $\hat \mu_{t+1} \geq \epsilon$ are not independent. This allows them to apply a Chernoff/Hoeffding bound to each tail event and sum them up. But for my desired result, I have not been able to find a trick that allows me to break up the dependence like that.

Any suggestions would be appreciated!