Derivative of Kronecker product inside a Frobenius norm

99 Views Asked by At

I need help to find the derivative w.r.t. to $ X $ in the problem below:

$$ \min_X \Vert A - (I \otimes X) \Vert_F^2 $$

where $A $ is a complex matrix, $ I $ is the identity matrix, and $\otimes$ denotes the Kronecker product.

The problem seems to be similar to this question but the trick of using 'vec' operarator does not work here.

2

There are 2 best solutions below

0
On BEST ANSWER

The bracket term is a block-diagonal matrix with $\mathbf{X}$ repeated along diagonal. We can thus write $$ \phi(\mathbf{X}) = \sum_{p=1}^P \| \mathbf{X} - \mathbf{A}_{pp} \|_F^2 $$ with $\mathbf{A}_{pp}$ the $p$-th diagonal block matrix (same size $M\times N$ as $\mathbf{X}$), explicitly computed as $$ \mathbf{A}_{pp} = \left( \mathbf{e}_p \otimes \mathbf{I}_M \right)^T \mathbf{A} \left( \mathbf{e}_p \otimes \mathbf{I}_N \right) $$

The gradient is easily found as $$ \frac{\partial \phi}{\partial \mathbf{X}} = \sum_{p=1}^P \left( \mathbf{X} - \mathbf{A}_{pp} \right) $$

0
On

$ \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\BR#1{\Bigg(#1\Bigg)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\sz#1{\operatorname{size}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\S{\sum_{k=1}^p} $Let's introduce the matrix variable $$ B = (I_p\otimes X) - A $$ as well as Frobenius product notation $(:)\,$ which is a convenient way to write the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A^*:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$


Assume that the size of the various matrix variables are $$\eqalign{ p,p &= \sz{I_p} \\ m,n &= \sz X \\ pm,pn &= \sz A \;=\; \sz B \\ }$$ and let $e_k$ denote the $k^{th}$ column of $I_p\,\big($i.e.$\,$ the standard basis vectors for ${\mathbb R}^{p}\big).$

Use the above notation to write the objective function, then calculate its differential and gradient. $$\eqalign{ \phi &= B^*:B \\ d\phi &= B^*:dB \\ &= B^*:(I_p\otimes dX) \\ &= \LR{\S(e_k\otimes I_m)^T\c{B^*}(e_k\otimes I_n)}:dX \\ &= \LR{\S(e_k\otimes I_m)^T\Big(\c{(I_p\otimes X^*)-A^*}\Big)(e_k\otimes I_n)}:dX \\ &= \S\BR{X^*-(e_k\otimes I_m)^TA^*(e_k\otimes I_n)}:dX \\ \grad{\phi}{X} &= \LR{pX^*-\S(e_k\otimes I_m)^TA^*(e_k\otimes I_n)} \\ }$$ Setting the gradient to zero (and taking its conjugate) yields $$\eqalign{ X &= \frac 1p \, \S(e_k\otimes I_m)^TA(e_k\otimes I_n) \\\\ }$$


The following result was implicitly used above
$$\eqalign{ &{\S(e_k\otimes I_m)\,\c{dX}\,(e_k\otimes I_n)^T} &= {\S(e_k\otimes I_m)\LR{I_\o\otimes dX}(e_k\otimes I_n)^T} \\ &&= {\S\LR{e_k\,I_\o\,e_k^T}\otimes\LR{I_m\,dX\,I_n^T}} \\ &&= {I_p\otimes dX} \\ }$$ where $I_\o = \big[ \o \big] \in {\mathbb R}^{\o\times\o}$