While reading through Xu et al. (2016) I stumbled upon this proof:
$$ \begin{align} \mathbb{E}[Wyy^\intercal W^\intercal] & = \mathbb{E}[Wyy^\intercal W^\intercal-W\mathbb{E}[y]\mathbb{E}[y]^\intercal W^\intercal]+\mathbb{E}[W\mathbb{E}[y]\mathbb{E}[y]^\intercal W^\intercal]\\\ & = \mathbb{E}[Wvar(y)W^\intercal]+\mathbb{E}[W\mathbb{E}[y]\mathbb{E}[y]^\intercal W^\intercal]\\ \end{align}$$
where $W$ and $y$ are independent.
What I don't get is how the first term becomes $\mathbb{E}[Wvar(y)W^\intercal]$. Sure it has to do with $var(X) = \mathbb{E}[X^2]-(EX)^2$, but I just don't see what are the steps to make it so.
$$ E[Wyy^⊺W^⊺−WE[y]E[y]^⊺W^⊺] = E[W ( yy^⊺ − E[y]E[y]^⊺ ) W^⊺] = E[W] \cdot E[ yy^⊺ − E[y]E[y]^⊺ ] \cdot E[W^⊺] $$
The last equality follows from the fact that $W$ and $y$ are independent.
But we have that, for a random vector $y$, by definition $ var(y) = E[yy^T ]- E[y] E[y]^T$.
So $$ E[W] \cdot E[(yy^⊺−E[y]E[y]^⊺)] \cdot E[W^⊺] = E[W] \cdot var(y) \cdot E[W^T] = E[W var(y) W^T]. $$ and we are done.