derivative of matrix function with kronecker product

873 Views Asked by At

In the derivation of an estimator, I'm meant to find the minimum of the following matrix scalar function:

$\underset\beta {argmin}$ $[S Y^\prime M^\prime - SX^\prime (kron(I_N,\beta) ) M^\prime ]^\prime \, [S Y^\prime M^\prime - SX^\prime (kron(I,\beta) ) M^\prime ]$

where $S$ is $q \times T$, $Y$ is $N \times T$, $M$ is $1 \times N$, $\beta$ is $k \times 1$, $X$ is $Nk \times T$, kron(A,B) is the kronecker product and $I_N$ is an $N \times N $ identity matrix.

Now I'm very confused on how to find the $k \times 1$ gradient from which I can then subsequently obtain a closed form solution for $\beta$. By deduction I think the derivative of $(kron(I_N,\beta) )$ should be $(kron(I_N,I_k) )$, but I'm not sure on how to make this work.

If you could suggest a reference to where to read up on such a problem (preferably something with loads of examples) or give hints on how to solve the problem, I would be most grateful.

Regards,

B.

2

There are 2 best solutions below

2
On

Let $f(\beta)$ be the considered function. $Df_{\beta}:h\in \mathbb{R}^k\rightarrow 2(SY^TM^T-SX^T(I\otimes \beta)M^T)^T(-SX^T(I\otimes h)M^T))$. In particular $\dfrac{\partial f}{\partial \beta_i}=2(SY^TM^T-SX^T(I\otimes \beta)M^T)^T(-SX^T(I\otimes e_i)M^T))$ where $e_i=[0,\dots,1,\cdots,0]^T$.

EDIT. (Answer to user213240). It is complicated. The derivative is in the form $Df_{\beta}(h)=trace(U(I\otimes h)M^T)=trace(M^TU(I\otimes h))=trace(W(I\otimes h))$ where $W\in M_{N,Nk}$ and $I\otimes h=diag(h,\cdots,h)$ ($N$ times $h$). To minimize $f$ you must solve: for every $h$, $Df_{\beta}(h)=0$. Let $N=k=2$ ; $W=[w_{i,j}]\in M_{2,4}$ and $I\otimes h=\begin{pmatrix}h_1&0\\h_2&0\\0&h_1\\0&h_2\end{pmatrix}$. We obtain: for every $h_1,h_2$, $trace((W(I\otimes h))=(w_{1,1}+w_{2,3})h_1+(w_{1,2}+w_{2,4})h_2=0$ ; we deduce the conditions $w_{1,1}+w_{2,3}=w_{1,2}+w_{2,4}=0$.

0
On

Allow me to define $$ \eqalign { Q &= SY^TM^T - SX^T(I_N\otimes\beta)M^T \cr } $$ Then the function and its differential are $$ \eqalign { f &= Q:Q \cr df &= 2Q:dQ \cr &= -2Q:SX^T(I_N\otimes d\beta)M^T \cr &= -2XS^TQ:(I_N\otimes d\beta)M^T \cr } $$ By making use of the "Kronecker-Vec" relation $$ \eqalign { {\rm vec}(AVB) &= (B^T\otimes A){\rm vec}(V) \cr } $$ first in the reverse direction, and then in the forward direction, we can simplify this further.

Since $M^T$ is a vector, we know that $M^T={\rm vec}(M^T)={\rm vec}(M)$, so we can write $$ \eqalign { (I_N\otimes d\beta)M^T &= (I_N\otimes d\beta)\,{\rm vec}(M) \cr &= {\rm vec}(d\beta\,M\,I_N^T) \cr &= {\rm vec}(d\beta\,M) \cr &= {\rm vec}(I_k\,d\beta\,M) \cr &=(M^T\otimes I_k){\rm vec}(d\beta) \cr &= (M^T\otimes I_k)\,d\beta \cr } $$ Substituting this into the previous expansion $$ \eqalign { df &= -2XS^TQ:(M^T\otimes I_k)\,d\beta \cr &= -2\,(M^T\otimes I_k)^TXS^TQ:d\beta \cr &= -2\,(M\otimes I_k)XS^TQ:d\beta \cr } $$ So the derivative is $$ \eqalign { \frac {\partial f} {\partial\beta} &= -2\,(M\otimes I_k)XS^TQ \cr } $$

Update

It occurs to me that you can apply the same "Kronecker-vec" tricks to $Q$ and obtain an explicit solution to your optimization problem. $$ \eqalign { Q &= SY^TM^T - SX^T(I_N\otimes\beta)M^T \cr &= SY^TM^T - SX^T(M^T\otimes I_k)\,\beta \cr } $$ Substituting $Q$ and setting the derivative to zero yields the following linear problem: $$ \eqalign { \bigg((M\otimes I_k)XS^T\bigg) SY^TM^T &= \bigg((M\otimes I_k)XS^T\bigg) SX^T(M^T\otimes I_k)\,\beta \cr A^TZ &= A^TA\,\beta \cr \beta &= A^{+} Z \cr &= \big[SX^T(M^T\otimes I_k)\big]^{+} (SY^TM^T) \cr } $$