Proof check: Concentration of maximum of a certain empirical process

56 Views Asked by At

Let $x_1,\ldots,x_n$ be iid uniformly distributed on the unit-sphere in $\mathbb R^d$ and let $y_1,\ldots,y_n$ be iid uniformly distributed on $\{\pm 1\}$, and independent of the $x_i$'s. Define $Z_n:=\sup_{u \in \mathbb R^d}\sum_{i=1}^n f_u(x_i,y_i)$, where the functions $f_u: \mathbb S_{d-1}\times \{\pm 1\} \to \mathbb R$ are defined by $f_u(x,y) := \dfrac{yx^\top u}{\|u\|}$.

Goals. I'm interested in tail-bounds for the $Z_n$. Ideally, I'd like to get high-probability upper-bound for $Z_n$, valid for at least large $n$ and $d$.

Note that for all $i \in [n]$ and $u \in \mathbb R^d$, we have following:

  • $\mathbb E[f_u(x_i,y_i)] = 0$
  • $\|f_u\|_\infty := \sup_{x \in \mathbb S_{d-1},\;y \in \{\pm 1\}} \dfrac{|yx^\top u|}{\|u\|} \le 1$
  • $\mbox{var}(f_u(x_i,y_i)) = \mathbb E_{y_i,x_i}\left[\dfrac{(x_i^\top u)^2}{\|u\|^2}\right] = 1/d=:\sigma^2$.
  • $\mathbb E[Z_n] = \mathbb E[\sup_{u \in \mathbb R^d} \sum_{i=1}^n y_i(x_i^\top u)/\|u\|] = \mathbb E[\sup_{u \in \mathbb S_{d-1}} \sum_{i=1}^n y_i x_i^\top u] = n\mathcal R_n(\mathcal H)$, i.e $n$ times the Rademarcher complexity of the class of linear $L_2$-bounded functions $\{\langle u,\cdot\rangle \mid u \in \mathbb S_{d-1}\}$ evalauted on the dataset $\{x_1,\ldots,x_n\}$. But it is a standard result that $\mathcal R_n(\mathcal H) \le (1/n)\sqrt{\sum_{i=1}^n \|x_i\|^2} = 1/\sqrt{n}$, and so $\mathbb E[Z_n] \le n/\sqrt{n}=\sqrt{n}$.

Define $\nu := 2\mathbb E[Z_n] + n\sigma^2 = 2\mathbb E[Z_n] + n/d$. Then, by Bousquet's inequality (see Theorem 2.1, for example), we have $$ \mathbb P(Z_n \le \mathbb E[Z_n] + \sqrt{2\nu t} + t/3) \ge 1 - e^{-t}, $$

and so, invoking $\mathbb E[Z_n] \le \sqrt{n}$ and taking $t = \Omega(\sqrt{n} \land d)$, we get

$$ \mathbb P(Z_n = \mathcal O(\sqrt{n})) \ge 1-e^{-\Omega(\sqrt{n} \land d)}. $$

Question. Is my reasoning correct, or else where have I made an error ?

Thanks in advance.