differentiation of a trace of a matrix

743 Views Asked by At

I have a function $F$, and $F=tr((X_1-f_1)(X_1-f_1)^T+(X_2-f_2)(X_2-f_2)^T+(X_1-X_2)(X_1-X_2)^T)$, where $X_1$, $X_2$, $f_1$ and $f_2$ are all $n\times n$ matrices.

So what is $\frac{dF}{dX_1}$ and $\frac{dF}{dX_2}$? It is a differentiation of a trace of a matrix. Hope you can give some details and a reference of where I can find the principle of this computation.

Thanks in advance!

(Is my solution right? for the first and last parts: $\frac{d}{dX_1}tr((X_1-f_1)(X_1-f_1)^T)=2(X_1-f_1)\frac{d}{dX_1}(X_1-f_1)=2(X_1-f_1)$

and

$\frac{d}{dX_1}tr((X_1-X_2)(X_1-X_2)^T)=2(X_1-X_2)\frac{d}{dX_1}(X_1-X_2)=2(X_1-X_2)$

by chain rule.

)

Is my solution correct?

2

There are 2 best solutions below

0
On

Since you ask for "principle of this computation", I recommend the differential operator technique for Jacobian identification in p. 199, Table 2 of this book. Working out matrix derivatives of trace of matrix functions appear therein p. 200 (Section 9) as a special case of this principle. It is instructive to master this disciplined way to approach such computation, especially for more complicated functions.

Your conclusions for the derivative of the first and last terms are correct.

0
On

Using the Frobenius (:) Inner Product the function can be written $$\eqalign{ F &= (X_1-f_1):(X_1-f_1)+ (X_2-f_2):(X_2-f_2) \cr &+\, (X_1-X_2):(X_1-X_2) }$$ Its differential is $$\eqalign{ dF &= 2(X_1-f_1):dX_1 + 2(X_2-f_2):dX_2 \cr &+\, 2(X_1-X_2):dX_1 + 2(X_2-X_1):dX_2 \cr\cr }$$ Setting $dX_2=0\,\,$ yields the gradient wrt $X_1$ $$\eqalign{ \frac{\partial F}{\partial X_1} &= 2(X_1-f_1) + 2(X_1-X_2) &= 4X_1 - 2X_2 - 2f_1 \cr\cr }$$ Setting $dX_1=0\,\,$ yields the gradient wrt $X_2$ $$\eqalign{ \frac{\partial F}{\partial X_2} &= 4X_2 - 2X_1 - 2f_2 \cr }$$