Gradient of matrix function using the trace

132 Views Asked by At

For the function

\begin{equation} \label{eq:sparsecost} \mathcal{C}\left(\mathbf{B}, \mathbf{A}\right) = \frac{1}{K} \sum_{k=1}^K \left ( \sum_j \left[ {S}_{kj} - \sum_i {A}_{ki} \: {B}_{ij} \right]^2 + \lambda \sum_i \log\left(1+\frac{{A}_{ki}^2}{\sigma^2}\right) \right), \end{equation}

I need to find the partial derivative $\partial \mathcal{C}/\partial \mathbf{A}$ and have been told it's a scalar function of matrix A only.

It's also given that the trace function should be used and the first part of the equation can be written as $\text{Tr}(\mathbf{E}\mathbf{E}^\top) / K $, where $\mathbf{E}$ is the error matrix $\mathbf{E} = \mathbf{S}-\mathbf{AB}$ and $\text{Tr}(\mathbf{X}) = \sum_i {X}_{ii}$.

The following identities are given: \begin{equation} \frac{\partial\, \text{Tr}(\mathbf{A} \mathbf{X}^\top)}{\partial \mathbf{X}} = \frac{\partial\, \text{Tr}(\mathbf{X} \mathbf{A}^\top)}{\partial \mathbf{X}} = \mathbf{A} \qquad {\sf and} \qquad \frac{\partial\, \text{Tr}(\mathbf{X} \mathbf{A} \mathbf{X}^\top)}{\partial \mathbf{X}} = \mathbf{X}\left(\mathbf{A} + \mathbf{A}^\top\right). \end{equation}

I haven't managed to get anywhere on this so any help would be appreciated.

1

There are 1 best solutions below

0
On

$ \def\o{{\tt1}}\def\l{\lambda}\def\s{\sigma} \def\C{\cal C}\def\p{\partial} \def\c#1{\color{red}{#1}} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\cLR#1{\left(\c{#1}\right)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $Use the symbols $\odot$ and $\oslash$ to denote the Hadamard (aka elementwise) product and quotient, and a colon to denote the Frobenius product $-$ which is a very convenient and compact algebraic notation for the trace function. $$\eqalign{ A:P &= P:A \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}P_{ij} \;=\; \trace{A^TP} \\ A:P &= I:A^TP \;=\; P^T:A^T \\ A:A &= \|A\|^2_F \\ }$$ Note that Frobenius and Hadamard products commute and that $J$ is the Hadamard identity. $$\eqalign{ A:\LR{P\odot L} &= \LR{A\odot P}:L \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}P_{ij}L_{ij} \\ J\odot A &= A\odot J = A\oslash J = A \\\\ }$$


For typing convenience, define the all-ones matrix $(J)$ and the following matrix variables. $$\eqalign{ E &= AB-S &\qiq dE = dA\;B \\ P &= A\odot A + \s^2J &\qiq dP = 2A\odot dA \\ M &= J\oslash P \\ L &= \log(P) &\qiq dL=dP\oslash P = M\odot dP \\ }$$

Rewrite the cost function in matrix notation (i.e. without any explicit summations) using the above definitions. Then calculate its differential and gradient. $$\eqalign{ \C &= \LR{\frac{E:E + \l\log\LR{P\oslash\s^2J}:J}{K}} \\ &= \LR{\frac{E:E + \l J:L - 2\l\log(\s)\,J:J}{K}} \\ d\C &= \LR{\frac{2E:dE + \l J:dL}{K}} \\ &= \LR{\frac{2E:\LR{dA\,B} + \l J:\LR{M\odot\c{dP}}}{K}} \\ &= \LR{\frac{2\LR{EB^T}:dA + \l M:\cLR{2A\odot dA}}{K}} \\ &= \LR{\frac{2{EB^T} + 2\l{A\oslash P}}{K}}:dA \\ \grad{\C}{A} &= \LR{\frac{2{\c{E}B^T} + 2\l{A\oslash\c{P}}}{K}} \\ &= \frac 2K\BR{\cLR{AB-S}B^T+\l A\oslash\cLR{A\odot A+\s^2J}} \\\\ }$$ Since $P$ is independent of $B$, the gradient wrt $B$ is much simpler to calculate. $$\eqalign{ \grad{\C}{B} &= \frac 2K\BR{A^T\c{E}} \;=\; \frac 2K\BR{A^T\cLR{AB-S}} \\ }$$