Assume $X_1,\dots, X_n \stackrel{\text{i.i.d.}}\sim N(\mu, \Sigma)$ and $Y_1,\dots, Y_n \stackrel{\text{i.i.d.}}\sim N(\gamma, \Gamma)$ are $d$-dimensional random vectors and the two sequences are pairwise independent, I want to find a high probability bound for the operator norm of the $d \times d$ covariance matrix: $$ \| \hat{C}\| = \left \| \frac{1}{n-1}\sum_{i=1}^n(X_i - \bar{X})(Y_i - \bar{Y})^T \right \|. $$
A trivial bound would use the fact that w.h.p. $\| X_i-\mu\|_2 \lesssim \text{trace}(\Sigma)$, and so w.h.p. \begin{align*} \| \hat{C}\| &\le \frac{1}{n-1}\sum_{i=1}^{n} \| (X_i - \bar{X})(Y_i - \bar{Y})^T\|\\ &= \frac{1}{n-1}\sum_{i=1}^{n} \| X_i - \bar{X}\|_2 \|Y_i - \bar{Y}\|_2\\ &= \frac{n}{n-1} \| X_1 - \bar{X}\|_2 \|Y_1 - \bar{Y}\|_2\\ &\lesssim \frac{n}{n-1} \text{trace}(\Sigma)\text{trace}(\Gamma) \end{align*}
since
$$ \|X_i - \bar{X}\| = \|X_i - \mu \|+\|\mu - \bar{X}\| \lesssim \text{trace}(\Sigma) + \text{trace}(\Sigma/n) \lesssim \text{trace}(\Sigma) $$
and similarly for the $Y$'s. I am just wondering if a better bound can be achieved (in particular, is it possible to get a bound that decays with sample size