The following is from the paper The Modern Mathematics of Deep Learning page 29. It is the source of my question.
I don't know how to prove the formula in the title? I think equation (1) is equivalent to the formula in the title. Once the formula of the title is proved, the following formula can be proved. So I want someone to help me to prove title. Could anybody help me?
Let us focus on a single sample $m=1$ and features $X$ that take values in $\mathcal{X}=\{-1,1\}^d.$ Then it holds that $\Sigma^{+}=\frac{X^{(1)}(X^{(1)})^T}{\left\|X^{(1)}\right\|^4}=\frac{X^{(1)}\left(X^{(1)}\right)^{T}}{d^{2}}$ and thus $$\mathbb{E}\left[\operatorname{Tr}\left(\Sigma^{+} \mathbb{E}\left[X X^T\right]\right)\right]=\frac{1}{d^2}\left\|\mathbb{E}\left[X X^T\right]\right\|_F^2 \tag{1}$$