I am trying to apply the general chain rule for a composition of two functions. The first outer function is a element wise function $\phi(x)$ and the second a matrix vector multiplication $W^Tx$:
$$D_x[\phi(W^Tx)]$$
As far as i understand in multivariable calculus for a composition of two functions $D_xf(g(x))$ we get $D_gf(g(x))\cdot D_xg(x)$. Hence i would expect to get $D_x[\phi(W^Tx)] = \phi'(W^Tx)\cdot W^T$, however this is incorrect. Can someone show me how to do this properly?
Let $y=W^Tx$, so that $\phi = \phi(y)$.
Then the differential in terms of the Hadamard ($\circ$) product is $$\eqalign{ d\phi &= \phi'\circ dy \cr &= {\rm Diag}(\phi')\,dy \cr &= {\rm Diag}(\phi')\,W^Tdx \cr }$$ And the derivative is $$\frac{\partial\phi}{\partial x}={\rm Diag}(\phi'(W^Tx))\,W^T$$