Kronecker product and vectorization of tensor modal product

195 Views Asked by At

Given a tensor $\mathcal{S}\in\mathbb{R}^{n_1\times\ldots\times n_d}$ and a matrix $M\in \mathbb{R}^{m_k\times n_k}$, let $\mathcal{A}$ be the modal product of $\mathcal{S}$ and $M$ along the $k$-th mode, i.e. $\mathcal{A}_{(k)}=M \mathcal{S}_{(k)}$. I would like to show the formula $$\operatorname{vec}({\mathcal{A}})=(I_{n_{k+1}\ldots n_d}\otimes M\otimes I_{n_1\ldots n_{k-1}})\cdot \operatorname{vec}(\mathcal{S}).$$ I understand how to show it when $k = 1$ because then $$\operatorname{vec}(\mathcal{A}) = \operatorname{vec}(\mathcal{A}_{(1)})=\operatorname{vec}(M\mathcal{S}_{(1)}) = \operatorname{vec}(M\mathcal{S}_{(1)}I_{n_2\ldots n_d})=(I_{n_2\ldots n_d}\otimes M)\operatorname{vec}(\mathcal{S}_{(1)}) = (I_{n_2}\otimes \ldots \otimes I_{n_d}\otimes M)\operatorname{vec}(\mathcal{S})$$ where I used the identiy $\operatorname{vec}(ABC) = (C^T\otimes A)\operatorname{vec}(B)$. However, if $k\neq 1$, it seems we don't have the identity $\operatorname{vec}(\mathcal{A}) = \operatorname{vec}(\mathcal{A}_{(k)})$, so how do I deal with those cases?

NB: Here $\mathcal{A}_{(k)}$ denotes the modal unfolding of the tensor $\mathcal{A}$ along the mode $k$ which is completely defined by $$\mathcal{A}_{(k)}(\hat{i}_k, \operatorname{col}(\hat{i}_k, \hat{n}_k)) = \mathcal{A}(i)$$ where $\hat{i}_k=[i_1,\ldots, i_{k-1},i_{k+1},\ldots i_d]$, $\hat{n}_k=[n_1,\ldots, n_{k-1},n_{k+1},\ldots n_d]$ and $\operatorname{col}(i,n)=i_1+(i_2-1)n_1+(i_3-1)n_1n_2+\ldots + (i_d-1)n_1\ldots n_{d-1}$.

2

There are 2 best solutions below

10
On BEST ANSWER

One approach is to note that $\mathcal S$ can be decomposed as the sum of simple tensors, $$ \mathcal S =\sum_{j=1}^r \bigotimes_{i=1}^d S_{ij} = \sum_{j=1}^r S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j}, $$ where $S_{ij} \in \Bbb R^{n_i}$ for all $j$ and $\bar \otimes$ denotes the "tensor product". Let $\phi_M^{(k)}$ denote the linear map associated with the modal product taken here. That is, $\phi_{M}^{(k)}$ is defined so that $\phi_M^{(k)}(\mathcal S) = \mathcal A$.

Note/show the following facts:

  • For all $j$, $$ \phi_M^{(k)}[S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j}] = \\ S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes (MS_{k,j})\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j} $$

  • For all $j$, $$ \operatorname{vec}[S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j}] = \\ S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes S_{k,j} \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j} $$

  • Denoting $ \hat M_k = I_{n_{k+1}\ldots n_d}\otimes M\otimes I_{n_1\ldots n_{k-1}}$, we have $$ \hat M_k(S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes S_{k,j} \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j}) = \\ S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes (MS_{k,j}) \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j} $$

From there, the proof is straightforward:

\begin{align} \operatorname{vec}(\mathcal A) &= \operatorname{vec} \left[\phi_M^{(k)}(\mathcal S)\right] \\ & = \operatorname{vec} \left[\phi_M^{(k)}\left[\sum_{j=1}^r S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j} \right]\right] \\ & = \operatorname{vec} \left[\sum_{j=1}^r \phi_M^{(k)}\left[ S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j} \right]\right] \\ & = \operatorname{vec} \left[\sum_{j=1}^r S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes (MS_{k,j})\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j} \right] \\ & = \sum_{j=1}^r \operatorname{vec} \left[ S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes (MS_{k,j})\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j} \right] \\ & = \sum_{j=1}^r S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes (MS_{k,j}) \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j} \\ & = \sum_{j=1}^r \hat M\left[S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes S_{k,j} \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j}\right] \\ & = \hat M\left[\sum_{j=1}^r S_{d,j} \otimes \cdots \otimes S_{k+1,j} \otimes S_{k,j} \otimes S_{k-1,j} \otimes \cdots \otimes S_{1,j}\right] \\ & = \hat M\left[\sum_{j=1}^r \operatorname{vec}\left[S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j}\right]\right] \\ & = \hat M \operatorname{vec}\sum_{j=1}^r \left[S_{1,j} \bar \otimes \cdots \bar \otimes S_{k-1,j} \bar \otimes S_{k,j}\bar \otimes S_{k+1,j} \bar \otimes \cdots \bar \otimes S_{d,j}\right] = \hat M \operatorname{vec}(\mathcal S), \end{align}

which is what we wanted.


We could also save ourselves a lot of notational headache if we prove it for $k = 1$, then generalize that proof as follows. Let $\tau$ denote the "transpose" map over $\Bbb R^{n_1 \times \cdots \times n_d}$ that switches dimensions $1$ and $k$ of $\mathcal S$. If I've understood your terminology correctly, then I believe this means we're swapping the $1$st and $k$th "modes". Similarly, let $\sigma$ denote the map that does the same thing for $\mathcal A$ (noting that $\mathcal S$ and $\mathcal A$ have different shapes).

Note that there is a permutation matrix $P_\tau$ such that $$ \operatorname{vec}(\tau(\mathcal S)) = P \operatorname{vec}(\mathcal S) $$ holds for all $\mathcal S \in \Bbb R^{n_1 \times \cdots \times n_d}$, and similarly there is a corresponding $P_\sigma$. We can write \begin{align} \operatorname{vec}(\phi_M^{(k)} \mathcal S) & = \operatorname{vec}(\sigma^{-1}(\phi_M^{(1)} \tau(\mathcal S))) \\ & = P_{\sigma}^{-1}\operatorname{vec}(\phi_M^{(1)} \tau(\mathcal S)) \\ & = P_{\sigma}^{T}\hat M_{1} \operatorname{vec}(\tau(\mathcal S)) = P_{\sigma}^{T} \hat M_1 P_{\tau} \operatorname{vec}(\mathcal S). \end{align}

So, it suffices to show that $P_{\sigma}^{T} \hat M_1 P_{\tau} = \hat M_k$. To that end, it suffices to show that $v^\top P_{\sigma}^{T} \hat M_1 P_{\tau} w = v^\top \hat M_k w$ for all appropriately sized column-vectors $v,w$. Moreover, it suffices to show that this holds for vectors of the form $v = v_1 \otimes v_2 \otimes v_3$ and $w = w_1 \otimes w_2 \otimes w_3$ where $v_1,w_1 \in \Bbb R^{n_1 \cdots n_{k-1}}$, $v_2 \in \Bbb R^{m_k}$, $w_2 \in \Bbb R^{n_k}$, and $v_3,w_3 \in \Bbb R^{n_{k+1} \cdots n_d}$.

0
On

Notations : $\mathcal{A}\circ\mathcal{B}$ is the tensor product, $\mathcal{A}\otimes\mathcal{B}$ the kronecker product, $\mathcal{A}\times_{\ell} M$ is the modal product along the $\ell$-th mode and $e_i^n$ is the i-th canonical vector of $\mathbb{R}^n$.

We can express a tensor $\mathcal{A}$ of $\mathbb{R}^{n_1\times...\times n_d}$ in term of the basis vectors $\mathcal{A} = \sum\limits_{i_1,\ldots,i_d}a_{i_1\ldots, i_d}e_{i_1}^{n_1}\circ\ldots\circ e_{i_d}^{n_d}$.

then \begin{align*}\mathcal{A}\times_{\ell}M &= \sum\limits_{i_1,\ldots,i_d}a_{i_1,\ldots,i_d}(e_{i_1}^{n_1}\circ \ldots \circ e_{i_d}^{n_d})\times_{\ell}M \text{ by linearity of }\times_{\ell}\\ &=\sum\limits_{i_1,\ldots,i_d}a_{i_1,\ldots,i_d}(e_{i_1}^{n_1}\circ\ldots\circ Me_{i_{\ell}^{n_{\ell}}}\circ\ldots\circ e_{i_d}^{n_d})\end{align*}

because in general, applying the n-fold product is just doing the matrix multiplication on each fibers of the tensor, so in coordinates, for a tensor $\mathcal{X}$, $$(\mathcal{X}\times_{\ell} M)_{i_1\ldots i_d} = \sum\limits_s M_{i_{\ell}s}\mathcal{X}_{i_1\ldots i_{\ell-1}i_s i_{\ell+1}\ldots i_d}$$ so for a tensor which is just the tensor product of vectors, we have $$((x^1\circ\ldots\circ x^d)\times_{\ell}M)_{i_1\ldots i_d}=\sum\limits_s M_{i_{\ell}s}x^1_{i_1}\ldots x^{\ell-1}x^{\ell}_sx^{\ell+1}_{i_{\ell+1}}\ldots x^d_{i_d} = x^1_1\circ \ldots\circ (Mx^{\ell})\circ\ldots \circ x^d$$

From the above we have \begin{align*} \operatorname{Vec}(\mathcal{A})&=\sum\limits_{i_1,\ldots,i_d}a_{i_1\ldots, i_d}\operatorname{Vec}(e_{i_1}^{n_d}\circ\ldots\circ (Me^{n_{\ell}}_{i_{\ell}})\circ\ldots\circ e_{i_d}^{n_d})\text{ by linearity of }\operatorname{Vec(\cdot)}\\ &= \sum\limits_{i_1,\ldots,i_d}a_{i_1\ldots, i_d} e^{n_d}_{i_d}\otimes\ldots\otimes Me_{e_{i_{\ell}}^{n_{\ell}}}\otimes\ldots\otimes e^{n_1}_{i_1} \end{align*}

because $\operatorname{Vec(x^1\circ\ldots\circ x^d)=x^d\otimes \ldots\otimes x^1}$, i.e. the $\operatorname{Vec}(\cdot)$ transforms tensor product of vectors into kronecker proudct but reverses the order of the vectors. Now using the mixed product property of kronecker products (i.e. using induction on the formula $(A\otimes B)(C\otimes D) = (AC)\otimes (BD)$), we have

\begin{align*} e_{i_d}^{n_d}\otimes\ldots\otimes Me_{i_{\ell}}^{n_{\ell}}\otimes\ldots\otimes e_{i_1}^{n_1} &= I_{n_d}e_{i_d}^{n_d}\otimes\ldots\otimes Me_{i_{\ell}}^{n_{\ell}}\otimes\ldots\otimes I_{n1}e_{i_1}^{n_1}\\ &= (I_{n_d}\otimes\ldots\otimes M \otimes \ldots\otimes I_{n_1})(e_{i_d}^{n_d}\otimes\ldots\otimes e_{i_1}^{n_1}) \end{align*}

Now "reversing the steps" we get

\begin{align*} \operatorname{Vec}(\mathcal{A}\times_{\ell}M) &= \sum\limits_{i_1,\ldots,i_d}a_{i_1,\ldots,i_d}(I_{n_d}\otimes\ldots\otimes M\otimes\ldots\otimes I_{n_1})\underbrace{(e_{i_1}^{n_d}\otimes\ldots\otimes e_{i_1}^{n_1})}_{\operatorname{Vec}(e_{i_1}\circ\ldots\circ{e_{i_d}})}\\ &=(I_{n_d}\otimes\ldots\otimes M\otimes\ldots\otimes I_{n_1})\operatorname{Vec}\left(\underbrace{\sum\limits_{i_1,\ldots,i_d}a_{i_1,\ldots,i_d}e_{i_1}\circ\ldots\circ{e_{i_d}}}_{=\mathcal{A}}\right) \end{align*}