Understanding $\epsilon-$net arguments for random matrix bounds

987 Views Asked by At

I'm reading this book to learn about $\epsilon$-net arguments for random matrix concentration bounds. In Section 4.2.2, a high probability upper bound is computed for the operator norm of matrix $A \in \mathbb{R}^{m \times n}$, whose elements are independent, zero mean, and sub-Gaussian. I don't fully understand what the purpose of constructing an $\epsilon$-net is in this proof. Specifically, they prove that for any fixed $x \in S^{n-1}, y \in S^{m-1}$, we have \begin{align} P(\langle Ax, y\rangle \geq u) \leq 2 \exp(-cu^2/K^2), \end{align} where $K = \max \|A_{ij}\|_{\psi_2}$. We know that $\|A\|_2 = \max_{x \in S^{n-1}, y \in S^{m-1}} \langle Ax, y\rangle$. if we know that the above bound is true for any $x \in S^{n-1}, y \in S^{m-1}$, doesn't it imply that $P(\|A\|_2 \geq u) \leq 2 \exp(-cu^2/K^2)$? If so, why do we need the $\epsilon-$net argument?

1

There are 1 best solutions below

2
On BEST ANSWER

You know that $\max_{x,y} P(\langle Ax,y\rangle \geq u) \leq 2\exp(-cu^2/K^2)$, which does not allow you to infer $P(\max_{x,y} \langle Ax,y\rangle \geq u) \leq 2\exp(-cu^2/K^2)$ (note the difference between the two statements).

The event $\{\max_{x,y} \langle Ax,y\rangle \geq u\}$ is a union of the events $\{\langle Ax,y\rangle \geq u\}$ over all $x,y$, i.e. $$ \{\max_{x,y} \langle Ax,y\rangle \geq u\} = \bigcup_{x,y} \{\langle Ax,y\rangle \geq u\}, $$ since $\max_{x,y} \langle Ax,y\rangle \geq u$ if and only if $\langle Ax,y\rangle \geq u$ for some $x$, $y$. This means that the event $\{\max_{x,y} \langle Ax,y\rangle \geq u\}$ in general has a larger probability that the events $\{\langle Ax,y\rangle \geq u\}$ individually, and knowing something about the events $\{\langle Ax,y\rangle \geq u \}$ individually will not directly give you anything about the union event.

When other approaches don't apply, one strategy of estimating such a union is to use countable subadditivity and upper bound it by a sum of the individual probabilities. However, for that you need the union to be countable, which is why people use $\varepsilon$-nets.