What is meant by this useage of the $\mathcal{O}$ notation in statistics

55 Views Asked by At

I'm reading this statistics paper, and one of the theorems reads (emphasis mine)

Theorem: Let $(\mathcal{X},\mathcal{A},P)$ be an arbitrary probability space, $\mathcal{F}$ a class of real-valued functions on $\mathcal{X}$ with $||f||_{\infty} \leq 1$. Let $(X_n)_{n \in \mathbb{N}}$ be a sequence of i.i.d. random variables drawn according to $P$, and $(P_n)_{n \in \mathbb{N}}$ the corresponding empirical distributions. Then there exists some constant $c > 0$ such that, for all $n\in \mathbb{N}$ with probability at least $\boldsymbol{1−\delta}$, $$\operatorname{sup}_{f \in \mathcal{F}} \left| P_n - Pf \right| \leq \frac{c}{\sqrt{n}} \int_0^{\infty}\sqrt{\operatorname{log} N(\mathcal{F}, \varepsilon, L_2(P_n))} d\varepsilon + \sqrt{\frac{1}{2n}\operatorname{log}\boldsymbol{\frac{1}\delta}}.$$

The paper then goes on to say that

We can see that if $\int_0^{\infty}\sqrt{\operatorname{log} N(\mathcal{F}, \varepsilon, L_2(P_n))} d\varepsilon < \infty$, then the whole expression scales as $\mathcal{O}(1/√n)$

They then show that indeed, the integral is finite. They conclude: (emphasis mine)

Let $X$ be compact subset of $\mathbb{R}^d$ and $k(x,y) = \operatorname{exp}(−||x−y||^2/\sigma^2)$. Then the eigenvectors in Theorem 16 converge with rate $\boldsymbol{\mathcal{O}(1/\sqrt{n})}$

where Theorem 16 states

[...] such that, $$||a_n u_n - u||_{\infty} \leq C' \operatorname{sup}_{f \in \mathcal{F}} \left| P_n - Pf \right|$$

which holds almost surely and $u_n$ are the eigenvectors mentioned in the preceeding quote.

My question is what they mean by "the eigenvectors converge with rate ${\mathcal{O}(1/\sqrt{n})}$".

In my eyes, the statement seems to imply that the eigenvectors converge almost surely with a rate of ${\mathcal{O}(1/\sqrt{n})}$, since the inequality in theorem 16 holds almost surely and nowhere do they mention that the convergence is meant to be in probabilty. However, the theorem seems to only give us that $$\mathbb{P}\left( \operatorname{sup}_{f \in \mathcal{F}} \left| P_n - Pf \right| > \frac{C}{\sqrt{n}} (1 + \operatorname{log}\frac{1}{\delta}) \right) \leq \delta,$$ which would give us that $$||a_n u_n - u||_{\infty} \in \mathcal{O}_p \left(\frac{1}{\sqrt{n}}\right),$$ i.e. the eigenvectors are stochastically bounded. Here i used the Wikipedia definition of $\mathcal{O}_p$.

My guess is that one of the following explanations applies here:

  1. In this context it is customary to mean $\mathcal{O}_p$ when writing $\mathcal{O}$.
  2. Because in our context $\mathcal{X}$ is compact and $\mathcal{F}$ is a Glivenko-Cantelli class, by some magic the statement of the theorem holds almost surely and independent of $\delta$. Since, for Glivenko-Cantelli classes, convergence in probabilty imples almost surely convergence, i can see this somehow working out. However i don't understand why the rate of convergence in probabilty would translate directly to the rate of almost surely convergence.
  3. I made some other conceptual mistake.

Thank you for helping!