Let $W_1\in \mathbb{R}^{n\times n}, W_2\in\mathbb{R}^{1\times n}, x\in\mathbb{R}^n$ and consider the function
$$H(W_1, W_2, x) = W_2 F(W_1x)$$ with $F\colon \mathbb{R}^n\to\mathbb{R}^n$ some differentiable function that acts component-wise (its Jacobian $D$ is diagonal).
Since, as a function of $W_1$, $H$ maps from $\mathbb{R}^{n\times n}\to\mathbb{R}$, the derivative should be an $n\times n$ matrix. I am trying to compute this derivative of this function with respect to $W_1$. What I have so far is the following:
$$\partial_{W_1} (W_2 F(W_1 x)) = \partial_{W_1}F(W_1x)\times W_2^T = \partial_{W_1}(W_1x)\times D^T\times W_2^T$$
What am I doing wrong/where to go from here...
$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Rather than subscripts, let's give each variable a unique name. Let's also adopt a convention wherein upppercase letters denote matrices, lowercase vectors, and Greek letters will be scalars. We'll also represent vectors as columns by default. For row vectors, use a transpose operation.
Define/rename the variables $$\eqalign{ W &= W_1 \\ v &= W_2^T \\ y &= Wx &\qiq dy = dW\,x \\ f &= f(y) &\qiq df = J\,dy,\quad J = \grad fy \\ \phi &= H \\ }$$ and use a colon to denote the matrix inner product product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{AB}:C &= A:\LR{CB^T} \;=\; B:\LR{A^TC} \\ }$$ With the above tools we can differentiate the scalar-valued function $$\eqalign{ \phi &= v:f \\ d\phi &= v:df \\ &= v:J\,dy \\ &= J^Tv:dy \\ &= J^Tv:\LR{dW\,x} \\ &= J^Tvx^T:dW \\ \grad{\phi}W &= J^Tvx^T \\ }$$
Update
Rereading the question, I see that $f(y)$ is an elementwise function.
In that case we need a bit more notation. When applied to a scalar argument $(z)$, the function and its derivative are $$\eqalign{ f=f(z), \qquad f'=f'(z) \qiq df = f'\,dz \\ }$$ When applied elementwise to a vector argument the functions become vector-valued $$\eqalign{ f &= f(y), \qquad f'= f'(y) \qiq df = f'\odot dy \\ }$$ where $\odot$ is the elementwise/Hadamard product, which can be replaced with a diagonal matrix $$\eqalign{ F = \Diag{f},\quad F' = \Diag{f'} \qiq df = F'\,dy \\ }$$ So substitute $F'$ for $J\LR{{\rm and}\ J^T}$ in the previous calculation.