Gradient of $X \mapsto \mbox{tr} \left(BXCX^TB^TBXCX^TB^T\right)$

192 Views Asked by At

Let us assume that

\begin{equation} f(X)=\mbox{tr}\left(XCX^TXCX^T\right), \end{equation}

in which $C\in\mathbb{R}^{r\times r}$ is a symmetric matrix, and $X \in \mathbb{R}^{r'\times r}$. From the Page 13 of the book The Matrix Cookbook, we know that

\begin{equation} \frac{\partial f}{\partial X}=4CXX^TCX. \end{equation}

Now, let us consider the following function:

\begin{equation} g(X)=\mbox{tr}\left(BXCX^TB^TBXCX^TB^T\right), \end{equation}

where $B\in\mathbb{R}^{m\times r'}$. What is the derivative of $g$ with respect to $X$ (i.e., $\frac{\partial g}{\partial X}$)?

4

There are 4 best solutions below

0
On BEST ANSWER

Here's yet another way. Define the derivative $Df(X)$ of $f$ at $X$ as the linear map such that for any matrix $H$ proportional to some $h$ you have $$f(X+H)=f(X)+Df(X)(H)+O(h^2)$$ In particular, you can calculate $Df(X)(H)$ as $$Df(X)(H)=f(X+H)-f(X)$$ where you can suppose that $H$ is proportional to a symbol $h$ such that $h^2=0$. Hence $D$ satisfies $$\begin{align} D(f\circ g)(X)(H) &=f(g(X+H))-f(g(X)) \\ &=f(g(X)+Dg(X)(H))-f(g(X))\\ &=f(g(X))+Df(g(X))(Dg(X)(H))-f(g(X))\\ &=[Df(g(X))\circ Dg(X)](H) \end{align}$$ so that $$D(f\circ g)(X)=Df(g(X))\circ Dg(X)$$

Now your function $g$ can be written as $$g(X)=v(u(z(X)))$$ where $$z(X)=BX$$ $$u(Y)=YCY^T$$ $$v(W)=W:W^T$$ and $A:B=\mathrm{tr}(A^TB)=\mathrm{tr}(AB^T)$ satisfies $A:B=B:A=B^T:A^T$ as in greg's answer. So it suffices to calculate $Dz(X)$, $Du(Y)$ and $Dv(W)$. Again, assuming $H$ is proportional to a symbol $h$ where $h^2=0$, we have $$Dz(X)(H)=z(X+H)-z(X)=B(X+H)-BX=BH$$ $$\begin{align} Du(Y)(H) &=u(Y+H)-u(Y) \\ &= (Y+H)C(Y+H)^T - YCY^T \\ &= YCH^T + HCY^T \\ &= 2YCH^T \end{align}$$ since $C=C^T$. Also $$\begin{align} Dv(W)(H) &= v(W+H)-v(Y) \\ &= (W+H):(W+H)^T - W:W^T \\ &= W:H^T + H:W^T \\ &= 2W:H^T \end{align}$$ Hence $$\begin{align} Dg(X)(H) &= D(v\circ u\circ z)(X)(H) \\ &= [Dv(u(z(X))) \circ D(u\circ z)(X)](H) \\ &= [Dv(u(z(X))) \circ Du(z(X)) \circ Dz(X)](H) \\ &= [Dv(u(z(X))) \circ Du(z(X))](BH) \\ &= [Dv(u(z(X)))] (2z(X)C(BH)^T) \\ &= 2u(z(X))):(2z(X)C(BH)^T)^T \\ &= 4u(BX):(BXC(BH)^T)^T \\ &= 4[BXC(BX)^T]:(BXC(BH)^T)^T \\ &= 4[BXC(BX)^T]:[BHCX^TB^T] \\ &= 4[B^TBXC(BX)^T]:[HCX^TB^T] \\ &= 4[B^TBXC(BX)^T(CX^TB^T)^T]:H \\ &= 4[B^TBXCX^TB^TBXC]:H \end{align}$$ In particular, for $B=I$ you get $$Df(X)(H) = 4[XCX^TXC]:H$$

Notice that what the matrix cookbook actually says is $$\frac{\partial}{\partial X}\mathrm{tr}(X^TCXX^TCX) = 4CXX^TCX$$ but the left hand side is $(\partial/\partial X)f(X^T)$, so it's not exactly what you have.

2
On

Rewrite it as $g(X)=BX\in\mathcal M_{m,r}(\mathbb R)$, you'll have $$\text{trace}(BXCX^\top B^\top BXCX^\top B^\top)=\text{trace}(BXC(BX)^\top BXC(BX)^\top)=\text{trace}(g(X)Cg(X)^\top g(X)Cg(X)^\top)$$ I guess you can continue from here.

0
On

$ \def\o{{\tt1}} \def\Pi{\prod_{i=1}^{k-1}} \def\Pj{\prod_{j=k+1}^{n}} \def\Pk{\prod_{k=\ell}^{m}} \def\Sk{\sum_{k=1}^{n}} \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Consider a matrix product $(P)$ constructed from individual quadratic factors $$\eqalign{ &Q_k = B_kXC_kX^TB_k^T \qiq dQ_k = B_k\LR{dX\:C_kX^T + XC_k\:dX^T}B_k^T \\ &P_{(\ell,\,m)} = {\Pk Q_k } \;=\; Q_\ell\cdot Q_{\ell+1}\cdot Q_{\ell+2}\,\cdots\, Q_m \\ &P = P_{(\o,\,n)} \\ }$$ with the convention that the partial product $\,P_{(\ell,\,m)}\,$ is the identity matrix when $\,\ell>m\,$
and the $\{Q_k,C_k\}$ matrices are symmetric.

It will also be convenient to define the partial sums $$ F_k = P_{(\o,\,k-\o)} \qquad G_k = P_{(k+\o,\,n)} $$ so that, for any $\o\le k\le n$ $$ P = F_k\; Q_k\; G_k \qquad \qquad \qquad \quad $$

The differential of $P$ can be written as $$\eqalign{ dP &= \Sk F_k\; dQ_k\; G_k \\ &= \Sk F_k\,B_k\BR{dX\:C_kX^T + XC_k\:dX^T}B_k^T G_k \\ }$$

For typing convenience, introduce the Frobenius product with the properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ M:\LR{AB} &= \LR{MB^T}:A \;=\; \LR{A^TM}:B \\ I:B &= B:I \;=\; \trace{B} \\ }$$ The gradient of the product can be calculated as follows $$\eqalign{ \phi &= I:P \\ d\phi &= I:dP \\ &= I:\Sk{F_k\, B_k\BR{\c{dX}\:C_kX^T+XC_k\:\c{dX^T}}B_k^T G_k} \\ &= \Sk B_k^TF^T_k\:\c{I}\;G^T_k\,B_kXC_k:dX \;+\; C_kX^TB_k^TF^T_k\:\c{I}\: G^T_kB_k:dX^T \\ &= \LR{\Sk B_k^TF^T_kG^T_k\,B_kXC_k \;+\: B_k^TG_kF_kB_kXC_k}:dX \\ \grad{\phi}{X} &= {\Sk B_k^T\LR{F^T_kG^T_k \;+\: G_kF_k}B_kXC_k} \\ }$$ In the current problem, $\,n=2\,$ and the $\{B_k,\,C_k\}$ matrices are independent of $k,\:$ so the gradient expression is fairly simple.

0
On

First of all, the gradient of $f$ is wrong and it should be $$ \frac{\partial f}{\partial \mathbf{X}} = 4(\mathbf{X}\mathbf{C}\mathbf{X}^T)\mathbf{X}\mathbf{C} $$ Now let $\mathbf{Y}=\mathbf{BX}$, so that your new function writes $g(\mathbf{X})=f(\mathbf{Y})$.

Using this fact, it follows (using the same notations as Greg) that the differential of $g$ writes \begin{eqnarray*} dg &=&4(\mathbf{Y}\mathbf{C}\mathbf{Y}^T)\mathbf{Y}\mathbf{C}:d\mathbf{Y} \\ &=&4\mathbf{B}^T(\mathbf{Y}\mathbf{C}\mathbf{Y}^T)\mathbf{Y}\mathbf{C}:d\mathbf{X} \end{eqnarray*} so the gradient of $g$ is $4\mathbf{B}^T(\mathbf{Y}\mathbf{C}\mathbf{Y}^T)\mathbf{Y}\mathbf{C}$