How to calculate the first-order differential of the real scalar function below

84 Views Asked by At

Give the following cost function \begin{equation} J(\mathbf{W})= \frac{1}{2}tr[\log(\mathbf{W}^T\mathbf{R}_1\mathbf{WA})-\mathbf{W}^T\mathbf{R}_2\mathbf{W}] \end{equation} Where $\mathbf{W}$ is $N \times r$ matrix, $\mathbf{R}_1$ and $\mathbf{R}_2$ are $N \times N$ symmetric matrix, $\mathbf{A}$ is a diagonal matrix.

My answer and procedures to the first-order differential of $J(\mathbf{W})$ are as follows: \begin{align} dJ(\mathbf{W})= &\frac{1}{2}tr\{(\mathbf{W}^T\mathbf{R}_1\mathbf{WA})^{-1}\{(d\mathbf{W}^T)\mathbf{R}_1\mathbf{W}+\mathbf{W}^T\mathbf{R}_1d(\mathbf{W}) \}\mathbf{A} - 2\mathbf{W}^T\mathbf{R}_2(d\mathbf{W})\} \\ = & \frac{1}{2}tr\{ d\mathbf{W}^T)\mathbf{R}_1\mathbf{W}\mathbf{A}(\mathbf{W}^T\mathbf{R}_1\mathbf{WA})^{-1} + \mathbf{A}(\mathbf{W}^T\mathbf{R}_1\mathbf{WA})^{-1}\mathbf{W}^T\mathbf{R}_1d(\mathbf{W}) \}-tr\{ \mathbf{W}^T\mathbf{R}_2(d\mathbf{W})\} \\ =& \frac{1}{2}tr\{( \mathbf{A}\mathbf{W}^T\mathbf{R}_1\mathbf{W})^{-1} \mathbf{A}\mathbf{W}^T\mathbf{R}_1d(\mathbf{W}) + \mathbf{A}(\mathbf{W}^T\mathbf{R}_1\mathbf{WA})^{-1}\mathbf{W}^T\mathbf{R}_1d(\mathbf{W}) \}-tr\{ \mathbf{W}^T\mathbf{R}_2(d\mathbf{W})\} \\ =& tr\{(\mathbf{W}^T\mathbf{R}_1\mathbf{W})^{-1}\mathbf{W}^T\mathbf{R}_1d(\mathbf{W}) \}-tr\{ \mathbf{W}^T\mathbf{R}_2(d\mathbf{W})\} \end{align}

So we can obtain the gradient of $J(\mathbf{W})$ via the identification of gradient of real scalar: \begin{equation} \nabla_1 J(\mathbf{W}) = \mathbf{R}_1 \mathbf{W}(\mathbf{W}^T\mathbf{R}_1\mathbf{W})^{-1} - \mathbf{R}_2\mathbf{W} \end{equation}

However, the corresponding paper gives different answer: \begin{equation} \nabla_2 J(\mathbf{W}) = \mathbf{R}_1\mathbf{W}(\mathbf{A}\mathbf{W}^T\mathbf{R}_1\mathbf{W}\mathbf{A}^{-1})^{-1} - \mathbf{R}_2\mathbf{W} \end{equation}

I strictly follow the identification rule of gradient of a real scalar function,

1.Is my computational procedure wrong?
2.Where went wrong?
3.If my answer is wrong , can someone gives the right procedure?

1

There are 1 best solutions below

1
On BEST ANSWER

What is strange is that your derivative does not depend on $A$. At first, I thought that your calculation was wrong. I fell back on your post and, this time, I did the calculation using $(tr(log(U)))'=tr(U^{-1}U')$.

I assume that the matrices are real and that the considered $\log$ is the principal logarithm; if one supposes that $W$ has maximum rank, that the matrices $R1$ and $A$ are invertible symmetric and that $W^TR1WA$ does not have $<0$ eigenvalues, then your computation is correct.

Beware, if you randomly choose your matrices, then $W^TR1WA$ often has $<0$ eigenvalues; in such a case, you must choose another $\log$; then your cost function is no longer necessarily with real values.

EDIT. Here, the derivative does not depend on $A$ because, when $A$ is a constant invertible matrix

$(tr(\log(UA)))'=tr((UA)^{-1}(UA)')=tr(A^{-1}U^{-1}U'A)=tr(U^{-1}U')$.