Derivative of elementwise operation

61 Views Asked by At

I was trying to take the derivative of $$1^T((AA^T) \circ (AA^T))1$$ with respect to $A$, where $A$ is a $m$ by $n$ matrix. I was trying to use the chain rule, but it doesn't work and I know the derivative will have dimension $m$ by $n$, but I cannot get this result. Are there anyone who knows how to approach this question?

1

There are 1 best solutions below

0
On

$ \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $For ease of typing, define the all-ones matrix $J=\o\o^T,\,$ the symmetric matrix $S=AA^T,\,$ and the Frobenius $(:)$ product $-$ which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ and (conveniently) commutes with the Hadamard product $$\eqalign{ (A\circ B):C &= A:(B\circ C) \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ }$$ Use the above notation to write the cost function. Then calculate its differential and gradient. $$\eqalign{ \phi &= \BR{(S\circ S)\o}:\o \\ &= (S\circ S):J \\ &= S:S \\ d\phi &= 2S:dS \\ &= 2S:(dA\,A^T+A\,dA^T) \\ &= 2(S+S^T):(dA\,A^T) \\ &= 4S:(dA\,A^T) \\ &= 4AA^TA:dA \\ \grad{\phi}{A} &= 4AA^TA \\\\ }$$