Let $m,d \to \infty$ with $m/d \to \rho \in (0,\infty)$. Let $z_1,\ldots,z_m$ be iid from $N(0,I_d)$ and let $A$ and $B$ be $d \times d$ be deterministic psd matrices. Define the random variable $S \ge 0$ by $$ S := \sum_{i,j}(z_i^\top A z_j)(z_i^\top B z_j) = \sum_{i,j} trace(Az_jz_i^\top Bz_jz_i^\top)=\sum_{i,j}trace(B D_{j,\ell}), $$ where $D_{j,\ell} := (w_j^\top w_\ell) w_jw_\ell^\top$, with $w_i := A^{1/2} z_i \sim N(0,A)$.
Question 1. For appropriately normalized $A$ and $B$ (e.g to ensure that $trace(A) = \mathcal O(1)$, etc.), is it possible to concentrate $s$ only using $\rho$ and information about the eigenvalues of $A$, $B$, and $AB$?
N.B. By "concentrate", I mean write $S=s_0 + o_{\mathbb P}(1)$ for some constant $s_0 \in [0,\infty)$.
For example, if $A = (1/d) I_d$, and $\lim_{d \to \infty }trace(B) = t \in [0,\infty)$, one can show that (see this post https://mathoverflow.net/a/396431/78539) $$ S \to s_0 = \rho(1+\rho)t, \tag{1} $$ in $L_2$ and thus in probability.
Thus, a satisfactory answer to the above question should recover this result at the least.
Notes
The method used to establish (1) doesn't seem to extend to case where neither $A$ nor $B$ is proportional identity matrix. I'm hoping for a general strategy for attacking the above problem, and similar ones.
We can write $S = f(z_1,\ldots,z_m)$, for a smooth (in fact, polynomial) function in the $z=(z_1,\ldots,z_m)$ random vector of size $md$ with components drawn iid from $N(0,1)$. If one could compute the mean and variance of $s$ (that is, limits thereof, in the limit $m,d \to \infty$ with $m/d \to \rho$), then one could use standard concentration inequalities for polynomials in Gaussians (e.g Hanson-Wright). This leads to the following question
Question 2. What is the mean and variance of $S$ ?
Note that since I'm only interested in concentrating $s$ in the asymptotic limit ($m,d\to \infty$ with $m/d \to \rho$), it is sufficient to only compute the dorminant terms of the mean and variance of $S$.
Disclaimer. This would be a rather long comment or edit to the question, so I decided to post it here to start a discussion on a route. The (insufficient) contribution here is to compute the asympotic mean of $S$, thus partially answering Question 1. I hope to come back here with ideas to compute the variance of $S$.
Change notation to $\Lambda := B$ and $\Gamma : = A$.
Assumption. Suppose the following limits exist and are finite:
For example, if $\Gamma = (1/d)I_d$, then $s=1$ and $t=\overline{t}=\lim_{d \to \infty}\mbox{tr}(\Lambda)$, giving $\langle S\rangle \to \rho(\rho + 1)t$.
Some experiments
Proof of the claim. Since $S:=\sum_{j,\ell}\mbox{tr}(\Lambda D_{j,\ell})$, where $D_{j,\ell} := w_j w_j^\top w_\ell w_\ell^\top = (w_j^\top w_\ell)w_jw_\ell^\top \in \mathbb R^{d \times d}$, the expectation of $S$ is given by (thanks to linearity of expectation and trace operators)
$$ \langle S\rangle = \sum_{j,\ell} \mbox{tr}(\Lambda \langle D_{j,\ell}\rangle). $$
To compute the mean matrix $\langle D_{j,\ell}\rangle \in \mathbb R^{d \times d}$, we distinguish between two cases.
$$ \langle D_{j,j}\rangle = \mbox{tr}(\Gamma)\langle w_jw_j^\top \rangle + E = \mbox{tr}(\Gamma)(\Gamma+E), $$
where $E$ is a $d \times d$ matrix with $\|E\|_{op} = o_{\mathbb P}(1)$ in the limit $d \to \infty$.
$$ \begin{split} \langle (D_{j,\ell})_{a,b}\rangle &= \langle (w_j^\top w_\ell) (w_jw_\ell^\top)_{a,b} \rangle = \langle (w_j^\top w_\ell) w_{a,j}w_{b,\ell}\rangle = \sum_c \langle w_{a,j}w_{c,j}w_{c,\ell}w_{b,\ell}\rangle\\ &= \sum_{c}\langle w_{a,j}w_{c,j}\rangle\langle w_{c,\ell}w_{b,\ell}\rangle = \sum_{c}\gamma_{a,c}\gamma_{c,b} =: (\Gamma^2)_{a,b}. % \langle w_{a,j}^2w_{a,\ell} w_{b,\ell} \rangle + \langle w_{a,j}w_{b,j}w_{b,\ell}^2\rangle + \sum_{c \ne a,b}\langle w_{a,j}w_{c,j}w_{c,\ell}w_{b,\ell}\rangle \end{split} $$
Invoking (1) then yields
$$ \begin{split} \langle S\rangle &= \sum_{j,l} \mbox{tr}(\Lambda\langle D_{j,\ell}\rangle) = \mbox{tr}(\Gamma)\sum_j\mbox{tr}(\Lambda(\Gamma + E)) + \sum_{j \ne \ell} \mbox{tr}(\Lambda\Gamma^2)\\ &=\frac{m}{d}\mbox{tr}(\Gamma)(d\cdot \mbox{tr}(\Lambda(\Gamma + E)))+\frac{m(m-1)}{d^2}(d^2\cdot \mbox{tr}(\Lambda\Gamma^2)) \to \rho st + \rho^2 \overline{t}. \end{split} $$
This completes the proof of the claim. $\quad\quad\quad\Box$