Johnson–Lindenstrauss: What is the dependence on the "variance factor"?

42 Views Asked by At

Boucheron, Lugosi, and Massart state the following version of the Johnson–Lindenstrauss theorem (simplified to hold for a single vector) as Theorem 2.13 in their book:

Let $S\in\mathbb{R}^{d\times n}$ be a matrix with independent mean-zero, unit variance, $v$-subgaussian random entries. Then $$\mathbb{P} \left\{ \left|\frac{1}{d} \|Sx\|^2 - \|x\|^2\right| \ge \varepsilon \|x\|^2 \right\} \le \delta$$ for $\delta,\varepsilon \in (0,1)$ provided $$ d \ge C \color{red}{v} \varepsilon^{-2} \log(1/\delta). \tag{$\star$}$$

Here, and throughout, $C$ denotes an arbitrary universal positive constant whose value is subject to change.

For the purpose of this post a mean-zero random variable is $v$-subgaussian (aka subgaussian with variance factor $v$) if

$$ \xi_X(\theta) := \log \mathbb{E} \exp(\theta X) \le \frac{1}{2} v\theta^2. $$

My question about this result is about the dependence of $d$ on $v$ in the bound ($\star$). When I try to prove this result myself using similar techniques, I get the alternate bound

$$ d \ge C \color{red}{v^2} \varepsilon^{-2} \log(1/\delta). \tag{$\star\star$} $$

My question:

What is the sharp dependence on the variance factor $v$ in this single-vector version of the Johnson–Lindenstrauss theorem, ($\star$) or ($\star\star$)? If it is ($\star$), why does the below proof give a suboptimal result?

Proof of ($\star\star$). Without loss of generality take $\|x\| = 1$ and set $Y_i = (Sx)_i$. By additivity of the cumulant generating function $\xi_{(\cdot)}(\theta)$ and the normalization of $x$, it is easily checked that $Y_i$ is mean-zero, variance-one, and $v$-subgaussian. Thus, $Y_i^2-1$ is mean-zero and has sub-exponential norm less than $Cv$ for constant absolute constant $C > 0$ (see Lemma 2.7.6). Thus, by the Bernstein inequality for subexponential random variables (Theorem 2.8.1)

\begin{align} \mathbb{P} \left\{ \left|\frac{1}{d} \|Sx\|^2 - \|x\|^2\right| \ge \varepsilon \|x\|^2 \right\} &= \mathbb{P} \left\{ \left|\sum_{i=1}^d (Y_i^2-1)\right| \ge d\varepsilon\right\} \\ &\le 2 \exp \left( -C \min\left( \frac{d\varepsilon^2}{v^2},\frac{d\varepsilon}{v} \right) \right). \end{align}

To achieve a failure probability of $\delta$ it is sufficient to take

$$ \delta \ge 2 \exp \left( -C \min\left( \frac{d\varepsilon^2}{v^2},\frac{d\varepsilon}{v} \right) \right) \implies d \ge C v^2 \varepsilon^{-2} \log(1/\delta). $$