I need to understand what follows:
Given
- $X \in \mathbb{R}^{T \times n_0}$
- $W \in \mathbb{R}^{n_0 \times n} : W_{ij}\sim \mathbb{N(0, \frac{1}{n_0})} \space i.i.d.$
- $\sigma : \mathbb{R} \rightarrow \mathbb{R} $ so that $\sigma(x) = max(0,x)$
- $S = \sigma(XW)$ where $\sigma(\cdot)$ is applied element-wise
lets define the matrices
- $K = \mathbb{E}_W[S^{}S^{T}]$
- $H = \mathbb{E}_W[S]^{}\mathbb{E}_W[S]^{T}$
In general $K$ and $H$ are different. Anyway, what is the relationship between the matrices $K$ and $H$ ?
To me this question is related to the "expectation of variance VS variance of expectation" topic, but in this case there are two main differences: random matrices are involved and I'm considering $S^{}S^{T}$ istead of $S^{T}S^{}$.
(Optional: can you give me some references where these topics are explained at the beginner level).