Derivative with respect to vectorized inverse Kronecker product

261 Views Asked by At

I am trying to derive the gradient of a function I wish to optimize, and wish to obtain the following derivative: $$ \frac{\partial}{\partial \pmb{x}} \left(\pmb{I} - \pmb{X} \otimes \pmb{X} \right)^{-1} \pmb{y} $$ with $\pmb{x} = \mathrm{vec}(\pmb{X})$, $\pmb{X}$ being a square asymetric matrix and $\pmb{y}$ a vector that is not a function of $\pmb{x}$, and $\otimes$ the Kronecker product. My thought was to first write: $$ \left( \pmb{y}^{\top} \otimes \pmb{I} \right) \mathrm{vec}\left( \left(\pmb{I} - \pmb{X} \otimes \pmb{X} \right)^{-1}\right) $$ next to let $\pmb{f} = \mathrm{vec}\left( \left(\pmb{I} - \pmb{X} \otimes \pmb{X} \right)^{-1}\right)$ and then to express the differential of $\pmb{f}$. I got to: $$ d\pmb{f} = \left(\left(\pmb{I} - \pmb{X} \otimes \pmb{X} \right)^{-\top} \otimes \left(\pmb{I} - \pmb{X} \otimes \pmb{X} \right)^{-1}\right) \left( \mathrm{vec}\left( (d\pmb{X}) \otimes \pmb{X} \right) + \mathrm{vec}\left( \pmb{X} \otimes (d\pmb{X})\right) \right) $$ in which $-\top$ is short for the transpose of an inverse. This seems close to the answer, but not quite there yet. I guess I am getting lost in trying to express $\mathrm{vec}\left( (d\pmb{X}) \otimes \pmb{X} \right)$ in terms of $d\pmb{x}$.

Edit: continuing this, I recognized there must be some permutation matrix $\pmb{P}$ such that: $$ \pmb{P}\mathrm{vec}( (d\pmb{x})\pmb{x}^{\top} ) = \mathrm{vec}((d\pmb{X}) \otimes \pmb{X}) $$ which I can use to further derive: $$ \begin{align} d\pmb{f} &= \left(\left(\pmb{I} - \pmb{B} \otimes \pmb{B} \right)^{-\top} \otimes \left(\pmb{I} - \pmb{B} \otimes \pmb{B} \right)^{-1}\right)\pmb{P}\left((\pmb{b} \otimes \pmb{I}) + (\pmb{I} \otimes \pmb{b})\right)d\pmb{b} \\ \frac{\partial \pmb{f}}{\partial \pmb{b}} &= \left(\left(\pmb{I} - \pmb{B} \otimes \pmb{B} \right)^{-\top} \otimes \left(\pmb{I} - \pmb{B} \otimes \pmb{B} \right)^{-1}\right) \pmb{P}\left((\pmb{b} \otimes \pmb{I}) + (\pmb{I} \otimes \pmb{b})\right). \end{align} $$ Which seems plausible. Thus, all that seems to be needed is an expression for $\pmb{P}$. I guess that will take a similar form as this answer, but I am not sure about it.

2

There are 2 best solutions below

0
On BEST ANSWER

Let $X\in {\mathbb R}^{n\times n}$ and $E$ be the identity matrix of the same size.
Let's also denote the $k^{th}$ column of $X$ by $x_k$.

Define the matrices $$\eqalign{ A &= (E\otimes E - X\otimes X),\quad M &= \pmatrix{E\otimes x_1\cr E\otimes x_2\cr\vdots\cr E\otimes x_n} \cr }$$ Calculate the differential of $A$. $$\eqalign{ dA &= -(X\otimes dX+dX\otimes X) \cr da &= {\rm vec}(dA) = -(M\otimes E+E\otimes M)\,dx \cr }$$ Now we can answer the question. $$\eqalign{ w &= A^{-1}y \cr dw &= dA^{-1}y \cr &= -A^{-1}\,dA\,A^{-1}y \cr &= -{\rm vec}(A^{-1}\,dA\,w) \cr &= -(w^T\otimes A^{-1})\,da \cr &= (w^T\otimes A^{-1})\,(M\otimes E+E\otimes M)\,dx \cr \frac{\partial w}{\partial x} &= (w^T\otimes A^{-1})\,(M\otimes E+E\otimes M) \cr }$$

0
On

Following Greg's variable naming and approach (up to a point), we have $$\eqalign{ A &= E\otimes E-X\otimes X \\ dA &= -(dX\otimes X+X\otimes dX)\\ w &= A^{-1}y \\ dw &= -A^{-1}\,dA\,A^{-1}y \\ &= A^{-1}\,(dX\otimes X+X\otimes dX)\,w \\ &= A^{-1}\,{\rm vec}(XW\,dX^T + dX\,WX^T) \\ &= A^{-1}\Big((E\otimes XW)K + (XW^T\otimes E)\Big)\,dx \\ \frac{\partial w}{\partial x} &= A^{-1}\Big((E\otimes XW)K + (XW^T\otimes E)\Big) \\ }$$ where $K$ is the permutation (aka commutation matrix) associated with Kronecker products,
and $W$ is the $n\times n$ matrix such that $w={\rm vec}(W)$.

This result seems simpler than constructing and utilizing Greg's $M$ matrix.