Facts: For matrices $A_i\in \mathbb{R}^{n\times n}$ with $i=1, 2, 3$, we have the following equation: $$ A_1\otimes A_2 \otimes A_3 = (A_1\otimes I_{n^2})(I_{n}\otimes A_2 \otimes I_n)(I_{n^2}\otimes A_3), $$ where $I_n$ represents an $n$ by $n$ identity matrix, and $\otimes$ denotes the Kronecker product.
My question is what would happen in the equation if matrices $A_i$ are not square matrices, i.e., $A_i\in \mathbb{R}^{m\times n}$, where $m,n$ are not necessarily equal. Is there a way to prove it?
Thank you very much!
Pulong
Equation for three matrices
Let $A_i$ be an $(r_i \times c_i)$ matrix for $i = 1,2,3$ and $I_u$ denote the $(u \times u)$ identity matrix. Then, $$ (A_1 \otimes A_2 \otimes A_3) = (A_1 \otimes I_{r_2} \otimes I_{r_3}) (I_{c_1} \otimes A_2 \otimes I_{r_3}) (I_{c_1} \otimes I_{c_2} \otimes A_3). $$
General Equation
Equation (1) can be generalized to $n$ matrices. Let $n > 1$ be an integer and $A_i \in \mathbb{R}^{r_i \times c_i}$ for $i = 1,\ldots,n$. Davio (1981) proved that $$ A_1 \otimes \ldots \otimes A_n = \prod_{k = 1}^{n} ( I_{c_1} \otimes \ldots \otimes I_{c_{k-1}} \otimes A_{k} \otimes I_{r_{k+1}} \otimes \ldots \otimes I_{r_{n}} ). $$ This general formula can also be proven using induction.
It is also worth noting that, the Shuffle algorithm (Plateau, 1985) that is used to multiply a vector with Kronecker product of matrices is based on this formula. The advantage of the Shuffle algorithm is that the Kronecker product of the matrices are not explicitly generated. A discussion on vector-Kronecker product multiplication algorithms can be found here.
References
Davio, M. (1981). Kronecker products and shuffle algebra. IEEE Transactions on Computers, 100(2), 116-125. doi: 10.1109/TC.1981.6312174
Plateau, B. (1985). On the stochastic structure of parallelism and synchronization models for distributed algorithms. SIGMETRICS Performance Evaluation Review, 13(2), 147–154. doi: 10.1145/317786.317819