So I am currently working with some matrix-equation systems. I often come across expressions of the type $$\left( \sum_{\forall i} \alpha_i\cdot{{\bf M}_i}^T{\bf D}_i{\bf M}_i \right) {\bf v} $$
Where ${\bf M}_i$ are convolution matrices (Circulant or Toeplitz), ${\bf D}_i$ are diagonal matrices, $\alpha_i$ are scalars and ${\bf v}$ is a vector.
I want to be able to calculate this fast and preferably with as few reads and writes to memory as possible.
I am aware that if I had ${\bf D}_i = {\bf I}$ then ${{\bf M}_i^T {\bf M}_i}$ can be simplified to a new convolution matrix with double size of convolution kernel, but these diagonal matrices seem to complicate things.