Let's say we have:
- $z_i$ a vector in $\mathbb{R}^k$
- $W$ a matrix in $\mathbb{R}^{d, k}$
- and $\Psi$ a invertible diagonal matrix in $\mathbb{R}^{d, d}$
I know that for ex (matrixcookbook):
$\frac{\delta Tr\ z_i^T {W}^T \Psi^{-1}Wz_i}{\delta W} = 2 \Psi^{-1}Wz_iz_i^T$
and:
$\frac{\delta Tr\ z_i^T {W}^T \Psi^{-1}x_i}{\delta W} = \psi^{-1}x_iz_i^T$
Now let s write $W^2 = (w_{i,j}^2)$, the matrix with squared elements of W.
How to compute $\frac{\delta Tr\ z_i^T {W^2}^T \Psi^{-1}W^2z_i}{\delta W}$ and $\frac{\delta Tr\ z_i^T {W^2}^T \Psi^{-1}x_i}{\delta W}$ ?
Thanks !
$ \def\a{\alpha}\def\b{\beta}\def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\ga{\grad{\a}{W}} \def\gb{\grad{\b}{W}} $For typing convenience, name the scalar functions $\{\a,\b\}$. You have already done most of the hard work by calculating their gradients $$\eqalign{ \ga &= 2 \Psi^{-1}Wz_iz_i^T \qquad\quad \gb &= 2 \Psi^{-1}x_iz_i^T \\ }$$ Instead of changing the meaning of $W\!,\,$ introduce a new matrix $U$ and express $W$ in terms of this new variable using the elementwise/Hadamard product $$\eqalign{W &= U\odot U}$$ Use the known gradient to write the differential of the scalar function in terms of $W$, then change the independent variable from $W\to U$, and recover the gradient with respect to $U$.
For example $$\eqalign{ d\a &= \LR{\ga}:dW \\ &= \LR{\ga}:\LR{2U\odot dU} \\ &= 2U\odot\LR{\ga}:dU \\ \grad{\a}{U} &= 2U\odot\LR{\ga} \\ }$$ where a colon has been used as a concise product notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This product plays nicely with the Hadamard product $$\eqalign{ A:(B\odot C) &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ &= (A\odot B):C \\ &= (C\odot A):B \\ }$$