Problem. I would like to compute the following expectation: $$F(A, B) = \mathbb{E}_X\left[X X^T A X X^T B X X^T \right]$$ where $X$ is a $d\times n$ matrix with every entry being i.i.d. standard normal.
Approach. We can look at the above equation entry-wise as follows:
$$F_{ij}(A, B) = \sum_{k_1, k_2, k_3, k_4}^d \sum_{l_1, l_2,l_3}^n A_{k_1 k_2} B_{k_3 k_4} \underbrace{\mathbb{E}\left[x_{i l_1} \cdot x_{k_1 l_1} \cdot x_{k_2 l_2} \cdot x_{k_3 l_2} \cdot x_{k_4 l_3} \cdot x_{j l_3} \right]}_{g(i, j, k_1, k_2, k_3, k_4, l_1, l_2, l_3)}.$$
So, the problem reduces to computing $g$ for different values of the indices. Further, as the $1^{\text{st}}$, $3^{\text{rd}}$ and the $5^{\text{th}}$ moments of the standard normal variable are $0$, the function $g$ can take one of the following values: $$\mathbb{E}[x^6] = 15, \text{ or } \mathbb{E}[x^4]\mathbb{E}[x^2] = 3, \text{ or } \mathbb{E}[x^2]\mathbb{E}[x^2]\mathbb{E}[x^2] = 1,$$ where $x$ is a standard normal variable. Now, I tried to list the sub-cases for which the indices would lead to either of these three cases. While listing these for the first two cases was simpler, the way I am doing it leads to 30+ subcases for the third case.
I have written a sympy code which can compute $F_{ij}(A, B)$ as a degree $2$ polynomial involving terms like $A_{ij} B_{rs}$, but such lengthy form is not useful for my problem. In particular, I foresee from some calculations that the solution $F(A, B)$ would involve terms like $AB$, $AB^T$, $A^T B$, $(AB)^T$, $\text{tr}(A)\cdot B$, $\text{tr}(B)\cdot A$, $\text{tr}(AB)\cdot I$, $\text{tr}(A)\cdot \text{tr}(B) \cdot I$ etc., with coefficients which depend on $n$. Such a form would be useful.
Motivation. The problem comes in the analysis of single-layer linear transformers.
Any help in terms of continuing this approach, or a new approach altogether, would be highly appreciated.