I'm looking into convex optimization and am somewhat confused by some concepts of vector calculus. My problem starts by looking at a scalar function: $$J = f(\mathbf y) = f(\mathbf x \mathbf W + \mathbf b)$$
Let's say that I want to calculate $\frac{ \partial J}{\partial \mathbf x}$. My first guess is to split up the question: $$\frac{ \partial J}{\partial \mathbf x} = \frac{ \partial J}{\partial \mathbf y} \frac{ \partial \mathbf y}{\partial \mathbf x}$$
The first half seems easy as it looks like the gradient of $f$. However I'm not sure what $\frac{ \partial \mathbf y}{\partial \mathbf x}$ means. Is this the Jacobian?
If so given that both $\mathbf y$ and $\mathbf x$ are horizontal vectors, I'm not sure if it would be: $$ \begin{bmatrix} \frac{\partial \mathbf y}{\partial x_1} & ... & \frac{\partial \mathbf y}{\partial x_n} \end{bmatrix} $$
Or rather: $$ \begin{bmatrix} \frac{\partial y_1}{\partial \mathbf x} & ... & \frac{\partial y_n}{\partial \mathbf x} \end{bmatrix} $$
Finally, if I wanted to calculate $\frac{ \partial J}{\partial \mathbf W}$ which would seem possible, is there such a thing as $\frac{ \partial \mathbf y}{\partial \mathbf W}$ or $\frac{ \partial \mathbf W}{\partial \mathbf x}$?
As you've discovered, it is awkward to apply the chain rule to these types of problems because the intermediate quantities are often higher-order tensors.
A simpler approach is to use differentials. Since $dX$ has exactly the same tensor character as $X$, you can use the familiar rules of scalar/vector/tensor algebra to manipulate it.
Let's being by writing down the variables of interest $$\eqalign{ y &= xW+b \cr J &= f(y) \cr }$$ Now find their differentials $$\eqalign{ dy &= dx\,W \cr\cr dJ &= \frac{\partial f}{\partial y}:dy \cr &= \frac{\partial f}{\partial y}:dx\,W \cr &= \frac{\partial f}{\partial y}W^T:dx \cr\cr \frac{\partial J}{\partial x} &= \frac{\partial f}{\partial y}W^T \cr\cr }$$ In the above, a colon was used to denote the inner/Frobenius product, i.e. $$\eqalign{A:B &= {\rm tr}(A^TB) \cr}$$