Suppose I sample $k$ vectors from normal distribution, centered at $0$ and with covariance $\Sigma$, then normalize them to have norm $=1$, and finally stack them as rows in a data matrix $X_k$.
I'd like to determine how big I can make $k$ such that $\|X_k\|<2$ with probability $p\approx 0.5$
Are there results that can help me determine such $k$ from easily computable properties of $\Sigma$?
Empirically on toy problems, it seems $k\approx 3*\text{intrinsic dimension}(\Sigma)$ defined as $\frac{\text{Tr}(\Sigma)}{|\Sigma\|}$.
Motivation: this helps determine largest batch size for which "linear learning rate scaling" works for batch-SGD on normalized linear least squares problems.
