I am trying to find a way of convincing myself of the validity of the Matrix Hoeffding lemma. The lemma states the following:
Consider a set $\{X^{(1)},\ldots,X^{(m)}\}$ of independent, random, Hermitian matrices of dimension $k\times k$, with identical distribution $X$. Assume that $E[X]$ is finite and $X^2\preceq\sigma^2 I$ for some positive constant $\sigma$ almost surely. Then, for all $\epsilon\geq 0$, $$Pr\left(\left|\left|\frac{1}{m}\sum_{i=1}^mX^{(i)}-E[X]\right|\right|_2\geq\epsilon\right)\leq ke^{-m^2\epsilon^2/8\sigma^2}.$$
I pretty much understand the proof of Hoeffding's inequality that uses Jensen's inequality and properties of moment generating functions but I am having trouble applying these notions to the case of random matrices. Namely, I understand that $X^2\preceq\sigma^2 I$ for examples means that $X^2$ will be $\epsilon_x$-close to $\sigma^2$ for some small constant $\epsilon_x$. So indeed, if the variances are small, then the sum should be close to the mean of $X$. I get the intuition, but do not find a way to prove it properly.
If someone could help me out here, it would be amazing.
The arguments used to prove the usual (1D) Hoeffding's inequality don't directly extend to the random matrices case. The full proof of this result is given in Section 7 of Joel Tropp's paper User-friendly tail bounds for sums of random matrices, and relies mainly on these three results :
Note that in the paper, the author actually proves the result for matrix martingales (i.e. Azuma-Hoeffding), but you can easily recover your desired result by considering $S_m:=\sum_{i=1}^m X^{(i)}/m - \mathbb E[X]$ and $Z_k := \mathbb E[S_m\mid X^{(1)},\ldots,X^{(k)}]$. You have by construction that $Z_k - Z_{k-1}$ is a martingale difference sequence and $S_m = \sum_{k=1}^m Z_k - Z_{k-1}$, hence the result applies for it too.
Remark : Under your conditions, you have a bound on the eigenvalues of $X$, and hence on the entries of $X$. Hence, you can naively apply a union bound and apply Hoeffding's inequlity entry-wise, which will yield a similar inequlity with a factor $k^2$ instead of $k$ in front of the exponential (and possibly another factor depending on $k$ inside the exponent too).
Of course, it is quite bad, especially if $k$ is large, but depending on your application it may be sufficient.