I wonder how to compute the gradient of the following scalar field
$$L(x) := \| \sin \left( W_2 \cos \left( W_1 x \right) \right) \|_2^2,$$
where $r \in [1, 95]$, $W_1 \in \mathbb{R}^{r \times 100}$ and $W_2 \in \mathbb{R}^{100 \times r}$.
To begin with,
$$\nabla L(x) = 2\sin(W_2 \cos(W_1 x))\cos(W_2 \cos(W_1 x)) A ,$$
where I consider $\sin(\cdot) \cos(\cdot)$ as vector consisting of element-wise product of the corresponding vectors, but I do not understand how to expand $A$ further (by $A$ I denote some placeholder for remains of the expression)
$ \def\a{\alpha}\def\b{\beta}\def\g{\gamma}\def\t{\theta} \def\l{\lambda}\def\s{\sigma}\def\e{\varepsilon} \def\n{\nabla}\def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $Let's use the symbol $(\odot)$ to denote the elementwise/Hadamard product and a colon (:) to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ NB: when applied to vectors $(n=\o)$ this reduces to the standard dot product.
The Frobenius and Hadamard products commute $$\eqalign{ A:(B\odot C) = (A\odot B):C = \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ }$$ Furthermore, the properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in several different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$ Now consider the following cascade of vector variables $$\eqalign{ a &= {W_1x} &\qiq da = W_1\,dx \\ s &= \sin(a),\;y = \cos(a) &\qiq dy = -s\odot da \\ b &= {W_2y} &\qiq db = W_2\,dy \\ c &= \cos(b),\;z = \sin(b) &\qiq dz = c\odot db \\ }$$ Use the definitions above to write the objective, then calculate its differential and gradient
(mostly by back-substitution). $$\eqalign{ L &= z:z \\ dL &= 2z:dz \\ &= 2z:\LR{c\odot db} \\ &= 2\LR{c\odot z}:db \\ &= 2\LR{c\odot z}:\LR{W_2\,dy} \\ &= 2W_2^T\LR{c\odot z}:dy \\ &= 2W_2^T\LR{c(b)\odot z}:\LR{-s\odot da} \\ &= -2s\odot\LR{W_2^T\LR{c\odot z}}:da \\ &= -2s\odot\LR{W_2^T\LR{c\odot z}}:\LR{W_1\,dx} \\ &= -2W_1^T\BR{s\odot\LR{W_2^T\LR{c\odot z}}}:dx \\ \grad{L}{x} &= -2W_1^T\BR{s\odot\LR{W_2^T\LR{c\odot z}}} \\ }$$ If you wish, you can continue the substitutions all the way back to the original variables, but the parentheses become so deeply nested that it starts to look like a snippet of LISP $$\eqalign{ \grad{L}{x} &= -2W_1^T\BR{\sin({W_1x})\odot\LR{W_2^T\LR{\cos(W_2\cos({W_1x}))\odot\LR{\sin({W_2\cos({W_1x})})}}}} \\ }$$ An alternate approach, which avoids parentheses, is to replace the vector Hadamard products with diagonal matrices $$\eqalign{ C &= \Diag{c} \\ S &= \Diag{s} \\ \grad{L}{x} &= -2W_1^T{S{W_2^T{Cz}}} \\ }$$