Derivative of Vectorization of matrix products w.r.t. a matrix

160 Views Asked by At

Suppose $\lambda \in l\times1 $$ y \in l\times 1$$A \in l\times mn $$L \in m\times r $$R \in n\times r $

$f=1/2 \parallel L\parallel_{F}^{2} + \lambda^{T} (y-A \text{vec}(LR^{T}) )$

I want to calculate the minimum of $f$.

So, how can I calculate

$\frac{\partial f}{\partial L} $ and $\frac{\partial f}{\partial R} $

I am confused with the derivative. Can someone tell me ?

1

There are 1 best solutions below

1
On

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$ Use a colon to denote the trace/Frobenius product $$A:B \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; {\rm Tr}(A^TB)$$ This product can also be applied to vectors by treating them as rectangular matrices (set $\,n={\tt1}$) in which case it corresponds to the usual dot product. The cyclic property of the trace allows the terms in such products to be rearranged in many equivalent ways, e.g. $$\eqalign{ A:BC &= B^TA:C = AC^T:B \\ B:C &= B^T:C^T = C:B \\ B:C &= {\rm vec}(B):{\rm vec}(C) \\ }$$ Note that terms on the RHS and LHS of the colon must have identical dimensions.

Use ${\rm devec}$ to denote the reverse of the vec operator $$A={\rm devec}(a) \quad\iff\quad a={\rm vec}(A)$$ For ease of typing, define the matrix $$M = {\rm devec}(A^T\lambda)$$ Write the objective function using the above definitions, then calculate its differential. $$\eqalign{ f &= \tfrac 12L:L - \lambda:\Big(A\,{\rm vec}(LR^T)-y\Big) \\ &= \tfrac 12L:L - A^T\lambda:{\rm vec}(LR^T) - \lambda:y \\ &= \tfrac 12L:L - M:LR^T - \lambda:y \\ df &= L:dL - M:(dL\,R^T+L\,dR^T) \\ &= (L-MR):dL - M^TL:dR \\ }$$ Holding $R$ constant (i.e. setting $dR=0$) yields the gradient with respect to $L$, while holding $L$ constant yields the gradient wrt $R$. $$\eqalign{ \p{f}{L} = L-MR, \qquad \p{f}{R} = -M^TL \\ }$$