Understanding an Inequality In a Paper

127 Views Asked by At

The paper is the following:

https://arxiv.org/pdf/1608.06412.pdf

Right above (5) on page 5, the first inequality of the line where they use lemma 8, I'm not quite sure I see how they are using lemma 8.

Lemma 8: Given $d$-dimensional square diagonal matrix $D$ and arbitrary square $d$ dimensional matrix $M$, $\max_i \min_j |\sigma_i(D) - \sigma_j(M)| \leq ||D-M||_{op}$

The inequality in question is

$||(I + B_j(\Sigma + \lambda I)^{-1})^{-1}||_{op} \leq \frac{1}{1 - \frac{||B_j||_{op}}{\lambda}}$

where $B_j = -(X_j X_j^T + \lambda I)/n$ and $\Sigma = X^T X /n $ where $X$ is just some arbitrary $n \times d$ data matrix.

I assume lemma 7 is also useful:

$||(M + \lambda I)||_{op} \leq 1/\lambda$

From my guess, they just applied the inverse of lemma 8, however that means that there is an issue with norm of the inverse vs inverse of the norm which I can't resolve (the inequality between norm inverse and inverse norm goes the wrong direction). Then they use submultiplicative property of norm along with lemma 7 to get the rough form of the RHS. I'm just not sure how lemma 8 is exactly applied because of this issue. Am I missing some matrix identity? Possibly woodbury?