Does the derivative with respect to a matrix have a Kronecker product matrix representation?

112 Views Asked by At

I'm confused why I end up with two matrices that are transposes of each other when I take a tensor inner product of a third order tensor with a vector, when I use two different Kronecker product matrix tensor representations that I believe are equal.

Let $A$ be a matrix and $x$ and $y$ be vectors. Using index notation, $$ y_i = A_{ip}x_p $$ $$ \begin{aligned} \frac{\partial{y_i}}{\partial{A_{jk}}} &= \frac{\partial{(A_{jk}}x_p)}{\partial{A_{jk}}} \\ &= \delta_{ij}\delta_{pk}x_p \\ &= \delta_{ij}x_p \end{aligned} $$ So, the derivative is a third order tensor, and with $\otimes$ the tensor product $$ \frac{\partial{y}}{\partial{A}} = I \otimes x $$ But since $$ \delta_{ij}x_p = x_p\delta_{ij} $$

$$ \frac{\partial{y}}{\partial{A}} = x \otimes I $$ But the tensor product is not commutative. Also, the Kronecker product representations are different. For example, assume $x$ is $2 \times1$ and $I$ is $2 \times 2$. $$ x_p\delta_{ij} = \begin{bmatrix}x_1 \\ x_2\end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} x_1 & 0 \\ 0 & x_1 \\ x_2 & 0 \\ 0 & x_2 \end{bmatrix} = A_{pij} $$ $$ \delta_{ij}x_p = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \otimes \begin{bmatrix}x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} x_1 & 0 \\ x_2 & 0 \\ 0 & x_1 \\ 0 & x_2 \end{bmatrix} = B_{ijp} $$

If I now take a tensor inner product of $A$ or $B$ with $z$ a $2 \times 1$ matrix. $$ \begin{aligned} (x \otimes I) \cdotp z &= A_{pij}z_k\delta_{jk} \\&= A_{pij}z_j \\&= C_{pi} \\&= \begin{bmatrix} A_{111}z_1 + A_{112}z_2 & A_{121}z_1 + A_{122}z_2 \\ A_{211}z_1 + A_{212}z_2 & A_{221}z_1 + A_{222}z_2 \end{bmatrix} \\&= \begin{bmatrix} x_1z_1 & x_1z_2 \\ x_2z_1 & x_2z_2 \end{bmatrix} \\&= xz^T \end{aligned} $$

$$ \begin{aligned} (I \otimes x) \cdotp z &= z_kB_{ijp}\delta_{ik} \\&= z_iB_{ijp} \\&= D_{jp} \\&= \begin{bmatrix} z_1B_{111} + z_2B_{211} & z_1B_{112} + z_2B_{212} \\ z_1B_{121} + z_2B_{221} & z_1B_{122} + z_2B_{222} \\ \end{bmatrix} \\&= \begin{bmatrix} z_1x_1 & z_1x_2 \\ z_2x_1 & z_2x_2 \end{bmatrix} \\&= zx^T \end{aligned} $$

Does it even make sense to represent a third order tensor with a Kronecker product, and what am I missing?

2

There are 2 best solutions below

0
On

Lets try to use index notation: $$ y_i = {A^p}_i x_p $$ So, for example, $$ \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} {A^1}_1 & {A^2}_1 \\ {A^1}_2 & {A^2}_2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} $$ Now we write the differential, $$ dy_i = d{A^p}_i x_p = \delta^p_{p'}\delta^{i'}_i x_p d{A^{p'}}_{i'} = \delta^{i'}_i x_{p'} d{A^{p'}}_{i'} $$ and, as expected, $\partial y /\partial A$ is a third order tensor $\delta^{i'}_i x_{p'}$.

What is the structure of this tensor? Using the set up in the example, $$ {\left(\frac{\partial y}{\partial A}\right)^{i'}}_{ip'}= \begin{bmatrix} \begin{bmatrix} x_1 & 0 \\ x_2 & 0 \end{bmatrix} \\ \\ \begin{bmatrix} 0 & x_1 \\ 0 & x_2 \end{bmatrix} \end{bmatrix} $$ So for $i=1$ we have a matrix whose first column is $\pmb{x}$ and the second column is zero. For $i=2$ we have the reverse.

Is there a way to convert this to a Kronecker product? First $\pmb{x}\otimes\pmb{I}$: $$ \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} =\begin{bmatrix} x_1 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \\ \\ x_2 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} x_1 & 0 \\ 0 & x_1 \end{bmatrix} \\ \\ \begin{bmatrix} x_2 & 0 \\ 0 & x_2 \end{bmatrix} \end{bmatrix} $$ which does not work. Next $\pmb{I} \otimes \pmb{x}$: $$ \begin{bmatrix} 1 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} & 0\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \\ 0 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} & 1\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} x_1 & 0 \\ x_2 & 0 \end{bmatrix} \\ \\ \begin{bmatrix} 0 &x_1 \\ 0 & x_2 \end{bmatrix} \end{bmatrix} $$ which works.

Since $\delta^{i'}_i x_{p'}$ is a third order tensor, its inner product with a vector $z_{i'}$ is, $$ \delta^{i'}_i x_{p'} z_{i'} = z_i x_{p'} $$ so we have a structure like, $$ \begin{bmatrix} z_1 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\\ \\ z_2 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} z_1 x_1 \\ z_1 x_2 \end{bmatrix}\\ \\ \begin{bmatrix} z_2 x_1 \\ z_2 x_2 \end{bmatrix} \end{bmatrix} $$ This can be summarised as, $$ (\pmb{I} \otimes \pmb{x}) \cdot \pmb{z}= \pmb{z}\otimes \pmb{x} $$ The only issue is that unless you are very familiar with the notation of matrix algebra (I am not) it is not straightforward to interpret how $\cdot$ acts on a Kronecker product.

It is not true that $\delta^{i'}_i x_{p'}=x_{p'}\delta^{i'}_i$; although tensor elements have this property (since they are scalars) it is not the case that $\delta^{i'}_i x_{p'}$ is the same object as $x_{p'}\delta^{i'}_i$. The Kronecker product $\pmb{a}\otimes\pmb{b}$ in index notation can be written as $a_i b_j$; given $i,j$, $a_ib_j=b_ja_i$. But tensor $T_{ji}=b_ja_i \neq a_ib_j=T_{ij}$ unless $T_{ij}$ is symmetric.

1
On

$ \def\l{\ell} \def\bb{\mathbb} \def\kp{\otimes} \def\hp{\odot} \def\dp{\star} \def\o{{\tt1}} \def\E{{\cal E}} \def\F{{\cal F}} \def\G{{\cal G}} \def\M{{\cal M}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\mt{\mapsto} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\wrt#1{\LR{{\rm wrt}\;#1}} $The dyadic product $(\dp)$ of $n$ vectors creates an $n^{th}$ order tensor, e.g. $$\eqalign{ \M = e\dp f\dp g\dp h \qiq \M_{ijk\l} = e_i\, f_j\, g_k\, h_\l }$$ whereas their Kronecker product $(\kp)$ merely creates a very long vector.
Similarly, the Kronecker product of $n$ matrices creates a very big matrix.
In short, you cannot construct higher-order tensors using only Kronecker products.

On the other hand, you can use Kronecker products to vectorize an equation $$\eqalign{ a = \vecc{A},\;y \equiv \vecc{y} \qiq y=\vecc{Ax} = \CLR{x^T\kp I}a \\ }$$ The gradient of this equation $\wrt a$ is obviously the matrix in $\c{\rm red.}$

One more product (indispensable for doing Matrix Calculus) is the double-dot product $(:)$ $$ F = \M:A \qiq F_{ij} = \sum_k\sum_\l\M_{ijk\l}\,A_{k\l} $$ When applied to matrices, this simplifies to the trace function $$\eqalign{ M:A = \sum_k\sum_\l M_{k\l}\,A_{k\l} \;=\; \trace{M^TA} \\ }$$


Using the ideas above, the gradient of the original equation $\wrt A$ is easy to calculate $$\eqalign{ y &= Ax \;=\; \LR{I\dp x}:A \\ dy &= \CLR{I\dp x}:dA &\{ {\sf differential} \} \\ \grad yA &= \LR{I\dp x} \qiq&\grad{y_i}{A_{jk}} = \delta_{ij}\,x_k \\ }$$ which is obviously a $3^{rd}$ order tensor.

Update

Note that $\LR{x^T\kp I}\ne\LR{I\dp x}$ since the LHS is a matrix (a second-order tensor) while the RHS is a third-order tensor. These objects are not dimensionally compatible.

However, there is a one-to-one mapping of every component on the RHS to every component on the LHS, e.g. $$\LR{x^T\kp I}_{(\o,\o)}=\LR{I\dp x}_{(\o,\o,\o)}$$ Similarly, $\vecc A\ne A$ although a one-to-one mapping also exists in this situation.

Another example that often trips people up is $\:\vecc A\ne\vecc{A^T}\ne\vecc{A}^T$