It is known that FFT can be used for computing the discrete convolution of complex sequences fastly, or even (using Strassen-Schonhage algorithm), convolution of sequences over any commutative ring $R$. But are there some methods for computing convolution of sequences fastly over non-commutative rings, for example, $M_2(\mathbb R)$?
It seems such a method will need lots of trick, because DFT relies on the structure of group algebra $\mathbb C(\mathbb Z/n\mathbb Z)$, but there isn't such a direct-sum decomposition for $(M_2(\mathbb R))(\mathbb Z/n\mathbb Z)$, which means $\{A_0+A_1x+A_2x^2+\cdots+A_{n-1}x^{n-1}|A_0,A_1,\cdots A_{n-1}\in M_2(\mathbb R)\}$ modulo $x^n-1$.
Fast Multiplication of Polynomials over Arbitrary Rings
On fast multiplication of polynomials over arbitrary algebras
These papers can help.