Need help understanding the proof of the Johnson-Lindenstrauss lemma by Dasgupta and Gupta

229 Views Asked by At

The Johnson-Lindenstauss lemma is the following:

Let $\varepsilon\in(0,1)$, $\mathcal{P}\subseteq\mathbb{R}^d$ be a finite subset and $k\in\mathcal{O}(\varepsilon^-2\log{n})$. Then there exists a (linear) map $M:\mathbb{R}^d\rightarrow\mathbb{R}^k$ s.t. $$(1-\varepsilon)\Vert x_i - x_j \Vert^2 \leq \Vert Mx_i - Mx_j\Vert^2 \leq (1+\varepsilon)\Vert x_i - x_j\Vert^2$$ for every $x_i,x_j\in\mathcal{P}$.

In the original proof Johnson and Lindenstrauss showed that a random orthogonal projection of rank $k$ has this property with positive probabilty, and hence concluded the existence of such a map. Here a random orthogonal projection of rank $k$ is obtained by drawing a map $U\in O(\mathbb{R^d})$ from the orthogonal group on $\mathbb{R^d}$ w.r.t the Haar measure and setting $M=cU^*QU$, where $Q$ is the projection onto the first $k$ coordinates and $U^*$ is the matrix consisting of the first $k$ rows of $U^T$ and $c$ is a suitable normalizing factor.

Several authors claim that Dasgupta and Gupta proved that if one chooses a $k\times d$ matrix where the rows are standard Gaussian vectors scaled with a factor of $\sqrt{d/k}$ then such a matrix has the desired property with positive probability as well. Unfortunately I don't see how they come to this conclusion. What am I missing?

I think it is best if I first describe what I think happens in their proof:

Dasgupta and Gupta (http://cseweb.ucsd.edu/~dasgupta/papers/jl.pdf) observed that because $U$ is drawn according to the Haar measure, the image $Uy$ of a fixed vector $y$ on the spehre is distributed uniformly on the unit sphere, hence instead of proving concentration inequalities for random projections (which in the original proof required results on measure concentration) one can alternatively prove concentration inequalities for the projection onto a fixed subspace (namely the first $k$ coordinates) of a point distributed uniformly on the sphere. So they show that the projection onto the first $k$ coordinates of a point $p$ distributed uniformly on the sphere has expected squared length $k/d$ and the squared length additionally is sharply concentrated around this mean. When we now take two points $x,y\in\mathcal{P}$ out of the original finite set $\mathcal{P}$, we can consider the difference $z = x - y$. Since $z/\Vert z\Vert$ lies on the sphere, estimating the probability that the squared distance $\Vert M(x-y)\Vert^2 = \Vert z\Vert^2\Vert M(z/\Vert z\Vert)\Vert^2 $ of the projections of the points $x$ and $y$ is not within the desired range boils down to applying the concentration inequalities derived for the point $z/\Vert z \Vert$ on the unit sphere. The claim then follows from the simple union bound over the set $\{x-y|x,y\in\mathcal{P},x\neq y\}$.

So it seems to me that Dasgupta and Gupta still use the decomposition $M=cU^*QU$ with $U\in O(\mathbb{R}^d)$, and just simplified the proof. I mean the second line of the proof is literally "Else take a random $k$-dimensional subspace $S$ [...]".