Numerator layout for derivatives and the chain rule

Question

Numerator layout for derivatives and the chain rule

2.2k Views Asked by Bumbble Comm At 27 Mar 2026 - 7:14

We have three matrices $\mathbf{W_2}$, $\mathbf{W_1}$ and $\mathbf{h}$ (technically a column vector):

$$ \mathbf{W_1} = \begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} \;\;\;\;\;\;\;\;\; \mathbf{W_2} = \begin{bmatrix} e & f \\ \end{bmatrix} \;\;\;\;\;\;\;\;\; \mathbf{h} = \begin{bmatrix} h_1 \\ h_2 \\ \end{bmatrix} $$

And a scalar $y$, where:

$$ y = \mathbf{W_2} \mathbf{W_1} \mathbf{h} $$

I'd like to compute the derivative of $y$ with respect to $\mathbf{W_1}$, assuming numerator layout.

Using the chain rule:

$$ y = \mathbf{W_2} \mathbf{u} \;\;\;\;\;\;\;\;\; \mathbf{u} = \mathbf{W_1} \mathbf{h} \\ $$

$$ \begin{align} \frac{\partial y}{\partial \mathbf{W_1}} &= \frac{\partial y}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{W_1}} \\ &= \mathbf{W_2} \frac{\partial \mathbf{u}}{\partial \mathbf{W_1}} \\ &= \mathbf{W_2} \mathbf{h}^{\top} \\ \end{align} $$

All well and good. Except - this isn't a $2x2$ matrix!! In fact, the dimensions don't match up for matrix multiplication, so something must be incorrect.

If we take the Wikipedia definition of the derivative of a scalar by a matrix, using numerator layout, we know that actually:

$$ \frac{\partial y}{\partial \mathbf{W_1}} = \begin{bmatrix} \frac{\partial y}{\partial a} & \frac{\partial y}{\partial c} \\ \frac{\partial y}{\partial b} & \frac{\partial y}{\partial d} \\ \end{bmatrix} $$

Each element is just a scalar derivative, which we can calculate without any matric calculus. If we do that by hand and then factorise, we end up with:

$$ \frac{\partial y}{\partial \mathbf{W_1}} = \mathbf{h} \mathbf{W_2} $$

Clearly, $\mathbf{h} \mathbf{W_2} \neq \mathbf{W_2} \mathbf{h}^\top $.

Can anybody suggest where I went wrong?

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2019-07-02 09:04:35

For the ∂u/∂W1, u is 2x1 vector and W1 is 2x2 matrix. So ∂u/∂W1 is a 2x2x2 tensor and not h⊤.

Ref: https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions

Notice that we could also talk about the derivative of a vector with respect to a matrix, or any of the other unfilled cells in our table. However, these derivatives are most naturally organized in a tensor of rank higher than 2, so that they do not fit neatly into a matrix

**Bumbble Comm** · Answer 2 · 2020-11-25 12:07:54

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$ Define the trace/Frobenius product as $$A:B \;=\; {\rm Tr}(A^TB) \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}$$ Using this product eliminates a whole category of transposition errors, which arise in other approaches.

The properties of the trace allow such products to be arranges in many equivalent ways, e.g. $$\eqalign{ A:C &= C:A &= C^T:A^T \\ AB:C &= A:CB^T &= B:A^TC \\ }$$ Note that the matrices on the LHS and RHS of the colon must have the same dimensions.
The Frobenius product is similar to the Hadamard product in this respect.

Let's define some variables without the distracting subscripts $$W = W_1, \qquad w=W_2^T$$ Write the scalar function in terms of these new variables. Then calculate its differential and gradient. $$\eqalign{ y &= w:Wh \\&= wh^T:W \\ dy &= wh^T:dW \\ \p{y}{W} &= wh^T \\ }$$ The dimensions of this result equal the dimensions of the $W$ matrix, expressed as the outer product of two column vectors.

Writing this in terms of the original variables $$\eqalign{ \p{y}{W_1} &= W_2^Th^T \\ }$$

Numerator layout for derivatives and the chain rule

There are 2 best solutions below

Related Questions in MATRICES

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions