Vector-by-vector and vector-by-matrix derivatives

Question

Vector-by-vector and vector-by-matrix derivatives

140 Views Asked by Bumbble Comm At 31 Mar 2026 - 3:05

I'm looking into convex optimization and am somewhat confused by some concepts of vector calculus. My problem starts by looking at a scalar function: $$J = f(\mathbf y) = f(\mathbf x \mathbf W + \mathbf b)$$

Let's say that I want to calculate $\frac{ \partial J}{\partial \mathbf x}$. My first guess is to split up the question: $$\frac{ \partial J}{\partial \mathbf x} = \frac{ \partial J}{\partial \mathbf y} \frac{ \partial \mathbf y}{\partial \mathbf x}$$

The first half seems easy as it looks like the gradient of $f$. However I'm not sure what $\frac{ \partial \mathbf y}{\partial \mathbf x}$ means. Is this the Jacobian?

If so given that both $\mathbf y$ and $\mathbf x$ are horizontal vectors, I'm not sure if it would be: $$ \begin{bmatrix} \frac{\partial \mathbf y}{\partial x_1} & ... & \frac{\partial \mathbf y}{\partial x_n} \end{bmatrix} $$

Or rather: $$ \begin{bmatrix} \frac{\partial y_1}{\partial \mathbf x} & ... & \frac{\partial y_n}{\partial \mathbf x} \end{bmatrix} $$

Finally, if I wanted to calculate $\frac{ \partial J}{\partial \mathbf W}$ which would seem possible, is there such a thing as $\frac{ \partial \mathbf y}{\partial \mathbf W}$ or $\frac{ \partial \mathbf W}{\partial \mathbf x}$?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-06-08 14:43:05

As you've discovered, it is awkward to apply the chain rule to these types of problems because the intermediate quantities are often higher-order tensors.

A simpler approach is to use differentials. Since $dX$ has exactly the same tensor character as $X$, you can use the familiar rules of scalar/vector/tensor algebra to manipulate it.

Let's being by writing down the variables of interest $$\eqalign{ y &= xW+b \cr J &= f(y) \cr }$$ Now find their differentials $$\eqalign{ dy &= dx\,W \cr\cr dJ &= \frac{\partial f}{\partial y}:dy \cr &= \frac{\partial f}{\partial y}:dx\,W \cr &= \frac{\partial f}{\partial y}W^T:dx \cr\cr \frac{\partial J}{\partial x} &= \frac{\partial f}{\partial y}W^T \cr\cr }$$ In the above, a colon was used to denote the inner/Frobenius product, i.e. $$\eqalign{A:B &= {\rm tr}(A^TB) \cr}$$

Vector-by-vector and vector-by-matrix derivatives

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions