What is the gradient of $\operatorname{trace}(XCP(XC)^T)$?

92 Views Asked by At

I am really stuck at calculating

$$\frac{d}{dC} \operatorname{trace} \left( X C P (X C)^T \right)$$

where $P \in R^{r\times r}$, $X \in R^{m\times n}$ and $C \in R^{n\times r}$ . Do I need to recall $A=XC$ and then apply chain rule such that

$$\frac{d \operatorname{trace}(XCP(XC)^T)}{dC}=\frac{d\operatorname{trace}(APA^T)}{dA}\cdot \frac{dA}{dC}$$

where first factor equals $A (P+P^T) $ (following this reference) and the second is $X$. Am I right ?

More correctly, by a quick dimensionality check, it would make more sense to have $X^T (XC) (P+P^T)$ at the end. Is it right ? Could someone help me prove it ?

Moreover, is $\operatorname{trace}(XCP(XC)^T)$ convex w.r.t. $C$ ?

1

There are 1 best solutions below

2
On BEST ANSWER

Your answer is correct. I think a slightly easier way to get there (from the formulas you gave) is to use invariance of trace under cyclic permutations to say $\mathrm{trace}(XCP(XC)^T)=\mathrm{trace} (X^T X C P C^T)$. You can then apply the formula $$ \frac{d \ \mathrm{trace}(A C B C^T)}{dC}=A^T C B^T + A C B $$ from your link, with $A=X^T X$ and $B=P$.

The convexity of the function depends on your choices of $X$ and $P$. If you pick $P=I$ and $X=I$ then your function is $\mathrm{trace}(C C^T)$ which is convex. On the otherhand if you pick $P=-I$ and $X=I$ then your function is $\mathrm{trace} (-C C^T)$ which is not convex.

To see that $\mathrm{trace}(C C^T)$ is convex in $C$, first note that $$ \mathrm{trace}(C C^T) \geq 0 $$ for any $C$ since $C C^T$ is positive semidefinite and the trace of a matrix is equal to the sum of its eigenvalues. Now, letting $t \in [0,1]$ and $A,B \in \mathbb{R}^{n \times r}$ it is easy to check that $$ \mathrm{trace}(t A (t A)^T)+\mathrm{trace}((1-t) A ((1-t) A)^T)=t^2 \mathrm{trace}(A A^T)+(1-t)^2 \mathrm{trace} (B B^T) $$ and $$ t^2 \mathrm{trace}(A A^T)+(1-t)^2 \mathrm{trace} (B B^T) \leq t\ \mathrm{trace}(A A^T)+(1-t) \ \mathrm{trace} (B B^T) . $$

A very rough guess for convexity is that $X^T X (P+P^T)$ should be positive semidefinite.