Proof of the Matrix Hoeffding lemma

283 Views Asked by At

I am trying to find a way of convincing myself of the validity of the Matrix Hoeffding lemma. The lemma states the following:

Consider a set $\{X^{(1)},\ldots,X^{(m)}\}$ of independent, random, Hermitian matrices of dimension $k\times k$, with identical distribution $X$. Assume that $E[X]$ is finite and $X^2\preceq\sigma^2 I$ for some positive constant $\sigma$ almost surely. Then, for all $\epsilon\geq 0$, $$Pr\left(\left|\left|\frac{1}{m}\sum_{i=1}^mX^{(i)}-E[X]\right|\right|_2\geq\epsilon\right)\leq ke^{-m^2\epsilon^2/8\sigma^2}.$$

I pretty much understand the proof of Hoeffding's inequality that uses Jensen's inequality and properties of moment generating functions but I am having trouble applying these notions to the case of random matrices. Namely, I understand that $X^2\preceq\sigma^2 I$ for examples means that $X^2$ will be $\epsilon_x$-close to $\sigma^2$ for some small constant $\epsilon_x$. So indeed, if the variances are small, then the sum should be close to the mean of $X$. I get the intuition, but do not find a way to prove it properly.

If someone could help me out here, it would be amazing.

1

There are 1 best solutions below

1
On BEST ANSWER

The arguments used to prove the usual (1D) Hoeffding's inequality don't directly extend to the random matrices case. The full proof of this result is given in Section 7 of Joel Tropp's paper User-friendly tail bounds for sums of random matrices, and relies mainly on these three results :

Lemma 1 (Proposition 3.1) : For any random Hermitian matrix $X$ and real number $t$, $$ \mathbb P(\|X\|_2\ge t)\le\inf_{\theta>0}\left\{e^{-\theta t}\ \mathbb E[\text{tr }e^{\theta X}]\right\}$$

Lemma 2 (Lemma 7.6) : For any fixed Hermitian matrix $H$ and random Hermitian matrix $X$ such that $\mathbb E[X]=0$, it holds that $$\mathbb E[\text{tr }e^{H+X}]\le \mathbb E[\text{tr }e^{H+2\varepsilon X}] $$ Where $\varepsilon$ is a Rademacher random variable independent of $X$.

Lemma 3 (Lemma 7.7) : For any fixed Hermitian matrix $A$ and random Hermitian matrix $X$ such that $X^2\preceq A^2$, the following holds : $$\log \mathbb E[e^{2\theta \varepsilon X}\mid X] \preceq 2\theta^2 A^2\quad \forall \theta\in\mathbb R $$ Where $\varepsilon$ is a Rademacher random variable independent of $X$.

Note that in the paper, the author actually proves the result for matrix martingales (i.e. Azuma-Hoeffding), but you can easily recover your desired result by considering $S_m:=\sum_{i=1}^m X^{(i)}/m - \mathbb E[X]$ and $Z_k := \mathbb E[S_m\mid X^{(1)},\ldots,X^{(k)}]$. You have by construction that $Z_k - Z_{k-1}$ is a martingale difference sequence and $S_m = \sum_{k=1}^m Z_k - Z_{k-1}$, hence the result applies for it too.

Remark : Under your conditions, you have a bound on the eigenvalues of $X$, and hence on the entries of $X$. Hence, you can naively apply a union bound and apply Hoeffding's inequlity entry-wise, which will yield a similar inequlity with a factor $k^2$ instead of $k$ in front of the exponential (and possibly another factor depending on $k$ inside the exponent too).
Of course, it is quite bad, especially if $k$ is large, but depending on your application it may be sufficient.