Can we control the distance between the empirical and theoretical mean on the whole trajectory any better than using Hoeffding and a union bound?

263 Views Asked by At

Suppose $X,X_1,X_2,X_3\dots$ is a $\mathbb{P}$-i.i.d. family of $[-1,1]$-valued random variables with $\mathbb{E}[X] = 0$.

Hoeffding's inequality implies that \begin{equation*} \forall T \in \mathbb{N}, \forall \varepsilon > 0, \qquad \mathbb{P}\bigg[\frac{1}{T} \sum_{s=1}^T X_s \ge \varepsilon\bigg] \le \exp\Big(-\frac{T}{2}\varepsilon^2\Big), \end{equation*} from which it follows that \begin{equation*} \forall T \in \mathbb{N}, \forall \delta \in (0,1), \qquad \mathbb{P}\bigg[\frac{1}{T} \sum_{s=1}^T X_s \ge \sqrt{\frac{2}{T}\log\Big(\frac{1}{\delta}\Big)}\bigg] \le \delta. \end{equation*}

Basically, this result states that the trajectory of the process $Y_t := \frac{1}{t}\sum_{s=1}^t X_s$ is below the threshold $\sqrt{\frac{2}{T}\log\Big(\frac{1}{\delta}\Big)}$ with probability at least $1-\delta$, at any prescribed time $T$.

Instead, suppose that we wanted the whole trajectory $Y_1, Y_2,\dots,Y_T$ under the thresholds $\sqrt{\frac{2}{1}\log\Big(\frac{1}{\delta}\Big)}, \sqrt{\frac{2}{2}\log\Big(\frac{1}{\delta}\Big)}, \dots, \sqrt{\frac{2}{T}\log\Big(\frac{1}{\delta}\Big)}$, where $T \in \mathbb{N}$ is fixed. Can we do any better than using the following union bound

\begin{equation*} \mathbb{P}\bigg[\bigcup_{t\in\{1,\dots,T\}} \bigg\{\frac{1}{t} \sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\Big(\frac{1}{\delta}\Big)}\bigg\}\bigg] \le \sum_{t=1}^T \mathbb{P}\bigg[\frac{1}{t} \sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\Big(\frac{1}{\delta}\Big)}\bigg] \le \delta\cdot T ? \end{equation*}

The problem with this last inequality is that a multiplicative factor $T$ appears in the bound, and I feel that maybe, due to something in the spirit of maximal martingale inequalities, we can do much better.

Any suggestion or pointer to the literature is very welcome.

EDIT: I tried to perform some simulations. Taking $X_1,X_2,\dots$ as a $\mathbb{P}$-i.i.d. family of Rademacher random variables seems to suggest that at least something of the form \begin{equation*} \forall T \in \mathbb{N}, \forall \delta \in \Big(0,\frac{1}{10}\Big), \qquad \mathbb{P}\bigg[\bigcup_{t\in\{1,\dots,T\}} \bigg\{\frac{1}{t} \sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\Big(\frac{1}{\delta}\Big)}\bigg\}\bigg] \le \delta\cdot \log(T) \end{equation*} holds, and even something better than this (maybe, even an upper bound of the form $O(\delta \cdot \log\log T)$).

2

There are 2 best solutions below

0
On BEST ANSWER

The dependence on $T$ is logarithmic, see $(4)$ below. I could not find a precise reference, so I give an argument below, making sure to get explicit constants. The argument is adapted from the standard proof of the law of the iterated logarithm, that can be found, e.g., in the textbook by Durrett, Probability Theory and examples, and in [1].

Let $S_t:=\sum_{j=1}^t X_j$ and $\lambda:=\sqrt{2\log (1/\delta)}$, so that $e^{-\lambda^2/2}=\delta$. The goal is to estimate $\mathbb{P}(A)$ , where $$A:=\bigcup_{t\in\{1,\dots,T\}} \bigg\{S_t \ge \lambda\sqrt{t} \bigg\} \,.$$ We will assume that $\lambda^2 \ge 2e$, otherwise $\mathbb{P}(A)$ will be close to $1$. We first prove an upper bound. Fix $q=e^\beta>1$ and let $$\tau_n:=\inf\left\{ t \in [q^n, q^{n+1}] :\, S_t\geq \lambda q^{n/2} \right\},$$ with the convention that the infimum of the empty set is $\infty$. Observe that $$A \subset \bigcup_{n=0}^{N-1} \{\tau_n \le q^{n+1}\}\,, \quad \text{where}\quad N=\Bigl\lceil\frac{\log T}{\log q}\Bigr\rceil \,. \tag{1}$$ Next, apply the Azuma-Hoeffding inequality to the martingale $M_t=S_{t\wedge\tau_n}$, to infer that $$\mathbb{P}(\tau_n \le q^{n+1})= \mathbb{P}\Bigl(M_{q^{n+1}} \ge \lambda q^{n/2} \Bigr) \le \exp\Bigl(-\frac{\lambda^2}{2q}\Bigr) \,.$$ Recalling that $q=e^\beta$, we deduce from $(1)$ that $$ \mathbb{P}(A) \le \Bigl\lceil\frac{\log T}{\beta}\Bigr\rceil \cdot \exp\Bigl(-\frac{\lambda^2}{2e^\beta}\Bigr) \,. \tag{2}$$ To minimize the right-hand side, we take $\beta \in (0,1]$ that satisfies $$ {e^\beta}/{\beta}=\lambda^2/2=\log(1/\delta) \,.\tag{3}$$ To get a concrete bound, we note that $(3)$ gives $$ \lambda^2/2 < {1}/{\beta}+2 \,, \quad \text{so}$$ $$\exp\bigl(- {1}/{ \beta}\bigr) < \exp( -\lambda^2/2+2) =e^2\delta \,.$$ Thus $(2)$ yields that $$ \mathbb{P}(A) \le \bigl(1+ \frac{\lambda^2}{2} \log T \bigr) \cdot \exp\Bigl(-\frac{1}{ \beta}\Bigr) < e^2\delta \log(e/\delta) \cdot \log T \,. \tag{4}$$

$ \quad$

To show that the dependence on $T$ is indeed logarithmic (and not $f(\delta)\log \log T$ ), we give a rough lower bound in the special case that $X_i$ are independent Rademacher variables (i.e., take values $\pm 1$), although it works more generally if they satisfy a CLT. We also assume that $1<\lambda<T^{1/8}$, although this can be relaxed as well.

Fix any $\epsilon>0$ and an integer $r>1$. Consider the independent events $$D_n=\left\{ S_{r^n} -S_{r^{n-1}} > \lambda r^{n/2} \right\}.$$ If $r^n >\sqrt{T}$, then by a version of the central limit theorem (That can be found e.g. in the book by Feller) and a lower tail bound for the Gaussian, $$\mathbb{P}(D_n) >\exp\Bigl(-\frac{\lambda^2 r}{2r-3}\Bigr)>\delta^{1+\epsilon} \,,$$ provided that $r$ is large enough so that $r/(r-3)<1+\epsilon$. Let $$N_0=\Bigl\lceil\frac{\log T}{2\log r}\Bigr\rceil \,.$$ If $\delta^{1+\epsilon} N_0<1$, then $$ \mathbb{P}\Bigl(\bigcup_{n=N_0}^{2N_0-2} D_n \Bigr) \ge 1-(1-\delta^{1+\epsilon})^{N_0-1} > (N_0-1)\delta^{1+\epsilon}/2 \,.$$ If $\sigma$ is the last $n \in [N_0,2N_0-2]$ such that $D_n$ holds, (which is well defined if the union in last display is nonempty), then $A$ will hold provided $S_{r^{\sigma-1}} \ge 0$, and the latter event has probability at least $1/2$ even conditional on $\sigma$. Thus we conclude that $$\mathbb{P}(A) \ge (N_0-1)\delta^{1+\epsilon}/4 \,.$$

[1] Brownian motion, by Peter Mörters and Yuval Peres. Cambridge University Press, 2010. https://yuvalperes.com/brownian-motion/

0
On

I'm not sure whether the following counts as better, but it's easy to get a bound of $O\left(\delta\sqrt{\log(1/\delta)}\cdot \sqrt{T}\right)$.

We have, for any $1 \le t \le T$, $$\mathbb{P}\left[\frac{1}{t}\sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\left(\frac{1}{10\delta}\right)}\right] \le 10\delta.$$ Note $$\sqrt{\frac{2}{t}\log\left(\frac{1}{\delta}\right)}-\sqrt{\frac{2}{t}\log\left(\frac{1}{10\delta}\right)} = \frac{\frac{2}{t}\log 10}{\sqrt{\frac{2}{t}\log(\frac{1}{\delta})}+\sqrt{\frac{2}{t}\log(\frac{1}{10\delta})}} \sim \frac{c}{\sqrt{t\log(1/\delta)}},$$ (where $c = \sqrt{2}\log 10$, I think). Therefore, if $$\frac{1}{t}\sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\left(\frac{1}{10\delta}\right)}$$ for some $t \in [T]$, then $$\frac{1}{t'}\sum_{s=1}^{t'} X_s \ge \sqrt{\frac{2}{t'}\log\left(\frac{1}{\delta}\right)}$$ for all $t' \in \left[t,t+\frac{c\sqrt{t}}{\sqrt{\log(1/\delta)}}\right]$. There are $m := C\sqrt{T\log(1/\delta)}$ values of $t$, say $t_1,\dots,t_m$, for which $[T] = \cup_{j=1}^m \left[t_j,t_j+\frac{c\sqrt{t_j}}{\sqrt{\log(1/\delta)}}\right]$. Therefore, \begin{align*} \mathbb{P}\bigg[\bigcup_{t\in\{1,\dots,T\}} \bigg\{\frac{1}{t}\sum_{s=1}^t X_s \ge \sqrt{\frac{2}{t}\log\Big(\frac{1}{\delta}\Big)}\bigg\}\bigg] &\le \mathbb{P}\bigg[\bigcup_{j=1}^m \bigg\{\frac{1}{t_j}\sum_{s=1}^{t_j} X_s \ge \sqrt{\frac{2}{t_j}\log\Big(\frac{1}{10\delta}\Big)}\bigg\}\bigg] \\ &\le m\cdot 10\delta \\ &= O\left(\delta\sqrt{\log(1/\delta)}\cdot \sqrt{T}\right). \end{align*}