The problem
Given two real symmetric positive definite matrices $\Sigma_1$ and $\Sigma_2$, and real positive diagonal matrices $A_1$ and $A_2$, I want to compute the inverse of the product
$$ C = A_1 \cdot \Sigma_1 \cdot A_1 \cdot A_2 \cdot \Sigma_2 \cdot A_2 $$
efficiently, knowing $\Sigma_1$ and $\Sigma_2$ beforehand.
What I already do
The first step I can do is to compute $\Sigma_1^{-1}$ and $\Sigma_2^{-1}$. Then, when I get $A_1$ and $A_2$, I need just to invert their diagonal elements, compute $A_1^{-1} \cdot \Sigma_1^{-1} \cdot A_1^{-1}$ and $A_2^{-1} \cdot \Sigma_2^{-1} \cdot A_2^{-1}$ by multiplying the columns and rows of $\Sigma_1^{-1}$ and $\Sigma_2^{-1}$ by appropriate elements (efficient multiplication by diagonal matrices) and then compute the resulting product
$$ C^{-1} = \left( A_2^{-1} \cdot \Sigma_2^{-1} \cdot A_2^{-1} \right) \cdot \left( A_1^{-1} \cdot \Sigma_1^{-1} \cdot A_1^{-1} \right) $$
I can also collapse $A_2^{-1} \cdot A_1^{-1}$ to avoid one multiplication by a diagonal matrix.
Though, when I have matrices big enough (say, $10^3 \times 10^3$), this last multiplication is still a significant performance issue, despite I don't invert the $C$ matrix anymore, but only multiply two matrices to get $C^{-1}$. The complexity of matrix inverse is almost the same as the matrix multiplication complexity.
I would like to get rid of the multiplication of $\Sigma_1$ and $\Sigma_2$ (during actual computation of $C^{-1}$) by computing something beforehand.
General problem
As the final result, I would like to be able to compute the products like the following efficiently, knowing all $\Sigma_i$ beforehand (able to compute everything I need for them before the inverse, but don't know $A_i$ beforehand)
$$ C = A_1 \cdot \Sigma_1 \cdot A_1 \cdot A_2 \cdot \Sigma_2 \cdot A_2 \cdots A_n \cdot \Sigma_n \cdot A_n. $$
Regarding the original ($n=2$) problem: it would suffice to be able to quickly compute the product $\Sigma_{2}^{-1} (A_2^{-1}A_1^{-1})\Sigma_1^{-1}$. In other words, given known matrices $P,Q$, we would like to efficiently implement a function that produces the product $PDQ^T$ for the diagonal matrix input $D$ (the transpose redundant in your case, but is convenient for what follows).
Let $m$ be the size of these matrices. Let $p_1,\dots,p_n$ denote the columns of $P$, and let $q_1,\dots,q_n$ denote the columns of $Q$. Let $d_1,\dots,d_m$ denote the diagonal entries of $D$. We can compute $$ PDQ^T = \sum_{k=1}^m d_k \cdot p_kq_k^T. $$ If $P$ and $Q$ are already known, then rank 1 matrices $p_kq_k^T$ can be computed in advance.
Another approach that you might like is as follows: we have $$ \operatorname{vec}(PDQ^T) = (Q \otimes P)\operatorname{vec}(D) $$ where vec denotes vectorization and $\otimes$ denotes a Kronecker product. In other words, we can reduce the desired multiplication to multiplying the $m^2 \times m^2$ matrix $P \otimes Q$ by a sparse vector $\operatorname{vec}(D)$ (which has at most $m$ non-zero entries).
I'm not sure what algorithms exist for such computations, but I suspect that there is something out there for this kind of situation.