Kronecker Product Packing Logic

58 Views Asked by At

Below are 4 examples of vector-matrix functions depending on the parameters $[a,b]^T$. We write each of the components of the vector separately, but we need to learn how to combine them into one vector-matrix operation, which I did, but I just can’t catch the general principle / rule / logic !!!

Example 1: $x=\begin{equation}\left[\matrix{a \cr b \cr}\right]\end{equation}$

$\begin{equation} \left[\matrix{ ([1] \otimes \frac{d}{da}x^T) \cdot x \cr ([1] \otimes \frac{d}{db}x^T) \cdot x \cr }\right] \end{equation}$

Joint operation:

$I_2 \cdot \begin{bmatrix} \frac{d}{da}x^T \\ \frac{d}{db}x^T \end{bmatrix} \cdot x$

Example 2: $x=\begin{equation}\left[\matrix{3 \cr 8 \cr}\right]\end{equation},A=\begin{bmatrix} a^3 & 0 \\ 0 & b-a \end{bmatrix}$

$\begin{bmatrix} ([1] \otimes \frac{d}{da}A) \cdot x \\ ([1] \otimes \frac{d}{db}A) \cdot x \end{bmatrix}$

Joint operation:

$(I_2 \otimes I_2) \cdot \begin{bmatrix} \frac{d}{da}A \\ \frac{d}{db}A \end{bmatrix} \cdot x$

Example 3: $x=\begin{bmatrix} 3 \\ 8 \end{bmatrix},A=\begin{bmatrix} a^3 & 0 \\ 0 & b-a \end{bmatrix}$

$\begin{bmatrix} ([1] \otimes (x^T \cdot \frac{d}{da}A)) \cdot x \\ ([1] \otimes (x^T \cdot \frac{d}{db}A)) \cdot x \end{bmatrix}$

Joint operation:

$(I_2 \otimes x^T) \cdot \begin{bmatrix} \frac{d}{da}A \\ \frac{d}{db}A \end{bmatrix} \cdot x$

Example 4: $x=\begin{bmatrix} a \\ b \end{bmatrix},A=\begin{bmatrix} 1 & 0 \\ -1 & 3 \end{bmatrix}$

$\begin{bmatrix} x^T \otimes (A \cdot \frac{d}{da}x) \\ x^T \otimes (A \cdot \frac{d}{db}x) \end{bmatrix} + \begin{bmatrix} \frac{d}{da}x^T \otimes (A \cdot x) \\ \frac{d}{db}x^T \otimes (A \cdot x) \end{bmatrix}$

Joint operation:

$x^T \otimes \begin{bmatrix} A \cdot \frac{d}{da}x \\ A \cdot \frac{d}{db}x \end{bmatrix}+\begin{bmatrix} \frac{d}{da}x^T \\ \frac{d}{db}x^T \end{bmatrix}\otimes (A \cdot x)$

My question is the following: what is the principle of "stitching" or "packing" such operations in order to obtain not each of the components of the vector separately, but the entire vector at once (as shown in the example of calculations from Mathcad). Where is the transition from a simple vector-matrix product to a Kronecker product, etc.? From the calculations, it can be seen that some principle or logic for connecting matrices from new blocks is needed. But I can't quite grasp this logic. I ask for help and I will be very happy and grateful!

enter image description here

https://dropmefiles.com/Xqqpa - File from Mathcad !!!

1

There are 1 best solutions below

0
On

$ \def\A{{\cal A}}\def\B{{\cal B}} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Let's look at the scalar-valued function in your first example. $$\eqalign{ \phi &= x\cdot x \\ d\phi &= dx\cdot x + x\cdot dx \;\equiv\; \c{2x}\cdot dx \\ \grad{\phi}{x} &= \c{2x} \qiq \grad{\phi}{a}=2a,\quad\grad{\phi}{b}=2b \\ }$$ The function $(f)$ in the second example is vector-valued, so let's introduce the cartesian basis vectors $\{e_k\}$ $$\eqalign{ y &= a^3e_1 + (b-a)e_2 \qiq dy=(3a^2e_1-e_2)\,da + e_2\,db \\ f &= \Diag y\cdot x \;\equiv\; \Diag x\cdot y \\ df&= \Diag x\cdot {dy} = \Diag x\cdot \LR{(3a^2e_1-e_2)\,\c{da} + e_2\,\c{db}} \\ \grad{f}{a} &= \Diag x\cdot (3a^2e_1-e_2) \;\equiv\; (3a^2x_1e_1-x_2e_2) \\ \grad{f}{b} &= \Diag x\cdot e_2 \;\equiv\; x_2e_2 \\ J &= \m{\grad{f}{a}&\grad{f}{b}} = \m{3a^2x_1&0\\-x_2&x_2}\qquad \big\{{\rm Jacobian\:Matrix}\big\} \\ }$$ Your "Joint Operation 2" is written more conventionally as $\,\vc{J}$

The general approach to calculating the gradient is to first calculate the differential of the function, then transform it into a gradient by rearranging the differential expression until it is in the following form $$\eqalign{ dF_{\c{ijk}} &= \sum_p\sum_q\sum_r\sum_s G_{\c{ijk}\,pqrs}\:dX_{pqrs} \\ }$$ where the free index grouping $\{ijk\}$ appears in the same order on $\{F,G\}$ while the summed (aka dummy) index grouping $\{pqrs\}$ must appear with the same ordering on $\{G,dX\}$

The tensor $G$ can now be identified as the desired gradient $$\eqalign{ G_{ijk\,pqrs} = \grad{F_{ijk}}{X_{pqrs}} \\ }$$ You seem to be trying to write everything in terms of Kronecker products, but that is not a general technique. Rather, it is a specialized "trick" that can be used in some situations to flatten matrices into vectors, e.g. $$v = \vc{AXB} \;\equiv\; \LR{B^T\otimes A}\vc{X}\\$$ A truly general relationship is the following $$d\LR{\A\star\B} = d\A\star\B + \A\star d\B$$ where $\{\A,\B\}$ can be tensors of different orders (i.e. scalar, vector, matrix, etc) and $\{\star\}$ can be any product (Kronecker, Hadamard, cross, etc) with which they are compatible.