My question concerns matrix-vector multiplications when your matrix has Kronecker structure, which can be done faster in that case.
I know how to compute this for a matrix $A = A_1 \otimes A_2$, which has two components $A_i$: $$Ax = (A_1 \otimes A_2)x = (A_1 \otimes A_2)vec(X) = vec(A_2XA_1^T)$$ where $vec(X) = x$ is the vectorization of $X$.
However, I have no idea how to proceed for more components $A_i$. I can imagine doing something as follows: $$Ax = (A_1 \otimes A_2 \otimes \cdots \otimes A_n)x = vec((A_2 \otimes \cdots A_n)XA_1^T) = (I_m \otimes A_2 \otimes \cdots \otimes A_n)vec(XA_1^T)$$ which provides me again with an actual matrix-vector multiplication. I was hoping to get a large identity matrix on the left-hand side this way, but no luck.
(EDIT: I realised it is impossible to do it this way, as the matrices $X$ and $A_1^T$ do not have the same dimensions.)
I tried looking it up on-line, but I mainly get the case for two components. Can anyone of you help me out? Thanks!
I will start with briefly introducing vector-Kronecker multiplication algorithms and related publications. Then, I will try to explain how these algorithms work with a small running example. However, the discussion of these algorithms is not trivial due to the curse of dimensionality. If you want to implement these algorithms, you should read related papers and work on some examples with Kronecker products of three matrices.
Introduction
Shuffle algorithm (Plateau, 1985) is an efficient vector-Kronecker product algorithm that can be used to compute multiplication of a vector and Kronecker product of rectangular matrices. Another approach (Pot-RwCl algorithm) is proposed by Buchholz et al. (2000), in which nonzero elements of the larger matrix are obtained on the fly and multiplied with the corresponding elements of the vector.
Dayar (2012) discusses the Shuffle algorithm and its usage in numerical solution of multi-dimensional Markov chains. Dayar and Orhan (2015) proposes a modification in Shuffle algorithm so that it requires less floating-point operations when there are zero rows and columns in smaller matrices.
Fackler (2019) proposes further improvements by showing how shuffling of data can be (largely) avoided and providing a simple method to determine the optimal ordering of the workflow. You may also find a MATLAB implementation by Gleich.
You may find detailed pseudocodes and examples for Shuffle algorithm, Pot-RwCl algorithm, and modified shuffle algorithm in (Dayar and Orhan, 2015). Note that, since these algorithms are applied to Markov chains, they multiply the vector from the left of the Kronecker product of matrices.
Problem
Let $$ x \in \mathbb{R}^{c_1 \ldots c_n \times 1} ~~~\text{ and }~~~ A = (A_1 \otimes \ldots \otimes A_n), $$ where $A_k \in \mathbb{R}^{r_k \times c_k}$ for $k = 1, \ldots, n$. We are interested in computing $$ y = A x $$ without explicitly generating the matrix $A$. Without loss of generality, assume that $A_k$ matrices are $0$-indexed that is the row and column indices are $\{0,\ldots,r_k-1\}$ and $\{0,\ldots,c_k-1\}$, respectively.
It becomes cumbersome as multi-dimensional indices are introduced. However, it is much more difficult to discuss an algorithm including Kronecker products (or tensors) without using multi-dimensional indices. Let $A_k(i_k,j_k)$ denote the row $i_k$ and column $j_k$ of $A_k$, then $A$ can be written elementwise as $$ A((i_1,\ldots,i_n),(j_1,\ldots,j_n)) = \prod_{k=1}^{n} A_k(i_k,j_k) $$ by the definition of Kronecker products.
It will also be useful to define a subvector of a column vector. We let $$ x(i_1,\ldots,i_{k-1},:,i_{k+1},\ldots, i_n) = \left[ \begin{array}{c} x(i_1,\ldots,i_{k-1},0,i_{k+1},\ldots, i_n) \\ \vdots \\ x(i_1,\ldots,i_{k-1},c_{k}-1,i_{k+1},\ldots, i_n) \end{array} \right] $$ where $i_{k'} = 1,\ldots,c_{k'}$ for $k = 1,\ldots,k-1,k+1,\ldots,n$.
Shuffle Algorithm
This algorithm is based on Shuffle algebra (Davio, 1981). Therein, it is shown that $$ A = \prod_{k = 1}^{n} ( I_{c_1} \otimes \ldots \otimes I_{c_{k-1}} \otimes A_{k} \otimes I_{r_{k+1}} \otimes \ldots \otimes I_{r_{n}} ) = \prod_{k = 1}^{n} ( I_{c_1 \ldots c_{k-1}} \otimes A_{k} \otimes I_{r_{k+1} \ldots r_{n}} ), \tag{1}\label{kronecker_identity} $$ where $I_{u}$ denotes $(u \times u)$ identity matrix.
Shuffle algorithm initializes a temporary vector $z^{(n)}$ to $x$ initially. Then, it computes intermediate vectors $$ z^{(k-1)} = ( I_{c_1 \ldots c_{k-1}} \otimes A_{k} \otimes I_{r_{k+1} \ldots r_{n}} ) z^{(k)} \tag{2}\label{shuffle_step} $$ for each factor $k = n,\ldots,1$. At the final step, $y = z^{(0)}$ is obtained.
Now come more equations with many indices. Factor-$k$ in Equation (1) can be written element-wise as $$ ( I_{c_1 \ldots c_{k-1}} \otimes A_{k} \otimes I_{r_{k+1} \ldots r_{n}} )((i_1,\ldots,i_n),(j_1,\ldots,j_n)) = \left\{ \begin{array}{c} A_{k}(i_k,j_k) & \text{ if } i_{k'} = j_{k'} \text { for } k' \neq k \\ 0 & \text{ otherwise } \end{array} \right. $$ where \begin{align} i_{k'}, j_{k'} \in \{ 0,\ldots,c_{k'}-1 \} & ~~~ \text{ for } ~~ k' = 1,\ldots,k-1 \\ i_{k'} \in \{ 0,\ldots,r_{k'}-1 \} ~~~ \text{ and } ~~~ j_{k'} \in \{ 0,\ldots,c_{k'}-1 \} & ~~~ \text{ for } ~~ k' = k \\ i_{k'}, j_{k'} \in \{ 0,\ldots,r_{k'}-1 \} & ~~~ \text{ for } ~~ k' = k+1, \ldots, n . \end{align}
Then, letting $z^{(k)} \in \mathbb{R}^{c_{1} \ldots c_{k} r_{k+1} \ldots r_{n}}$, Equation (2) can be written as $$ z^{(k-1)}(i_1,\ldots,i_{k-1},:,i_{k+1},\ldots,i_n) = A_{k} z^{(k)}(i_1,\ldots,i_{k-1},:,i_{k+1},\ldots, i_n) \tag{3}\label{elementwise_shuffle_step} $$ where \begin{align} i_{k'} \in \{ 0,\ldots,c_{k'}-1 \} & ~~~ \text{ for } ~~ k' = 1,\ldots,k-1 \\ i_{k'} \in \{ 0,\ldots,r_{k'}-1 \} & ~~~ \text{ for } ~~ k' = k+1, \ldots, n . \end{align}
Finally, pseudocode of the shuffle algorithm can be written as
Pot-RwCl Algorithm
This algorithm initially proposed for square matrices but it can easily be expanded to rectangular matrices. The idea is to obtain (but not store) the elements of the large matrix recursively by visiting each element of the each factor and multiply with he corresponding vector element.
In the pseudocode of this algorithm, it is more convenient to use flat indices instead of multi-dimensional indices. Note that, multi-dimensional index of a vector $x \in \mathbb{R}^{c_1 \ldots c_n \times 1}$ corresponds to the flat index $\sum_{k=1}^{n} i_k \prod_{k'=k+1}^{n} c_{k'}$.
Psuedocode of this algorithm can be written as
References
Davio, M. (1981). Kronecker products and shuffle algebra. IEEE Transactions on Computers, 100(2), 116-125. doi: 10.1109/TC.1981.6312174
Plateau, B. (1985). On the stochastic structure of parallelism and synchronization models for distributed algorithms. SIGMETRICS Performance Evaluation Review, 13(2), 147–154. doi: 10.1145/317786.317819
Buchholz, P., Ciardo, G., Donatelli, S., & Kemper, P. (2000). Complexity of memory-efficient Kronecker operations with applications to the solution of Markov models. INFORMS Journal on Computing, 12(3), 203-222. doi: 10.1287/ijoc.12.3.203.12634
Dayar, T. (2012). Analyzing Markov chains using Kronecker products: theory and applications. Springer Science & Business Media. h10.1007/978-1-4614-4190-8
Dayar, T., & Orhan, M. C. (2015). On vector-Kronecker product multiplication with rectangular factors. SIAM Journal on Scientific Computing, 37(5), S526-S543. 10.1137/140980326
Fackler, P. L. (2019). Algorithm 993: Efficient Computation with Kronecker Products. ACM Transactions on Mathematical Software (TOMS), 45(2), 1-9. 10.1145/3291041