Understanding numerator/denominator layout in matrix-calculus

Question

Understanding numerator/denominator layout in matrix-calculus

988 Views Asked by Bumbble Comm At 28 Mar 2026 - 8:05

This is a distilled version of this question.

Consider the following: $$ \begin{align} z & = f(\mathbf{y}) \\ \mathbf{y} & = g(\mathbf{x}) \\ \text{where, } & z \in \mathbb{R} \text{, and} \\ & \mathbf{y}, \mathbf{x} \text{ are two $(1, m)$ dimensional vectors, i.e. row-vectors} \end{align} $$

Using numerator-layout, what is the dimension of the derivative $\frac{\mathrm{d}z}{\mathrm{d}\mathbf{y}}$?

Should it be a column-vector of dimension $(m, 1)$, because $\mathbf{y}$ is a row-vector of dimension $(1, m)$ (Source)
- But, using this notation causes issues while computing the derivative $\frac{\mathrm{d} z}{\mathrm{d} \mathbf{x}} = \frac{\mathrm{d} z}{\mathrm{d} \mathbf{y}} \frac{\mathrm{d} \mathbf{y}}{\mathrm{d} \mathbf{x}}$; since, $\frac{\mathrm{d} \mathbf{y}}{\mathrm{d} \mathbf{x}}$ would be an $(m, m)$ matrix, while $\frac{\mathrm{d} z}{\mathrm{d} \mathbf{y}}$ is an $(m, 1)$ vector.
- However, this notation does serve well when computing the derivatives of the form $\frac{\mathrm{d} h(\mathbf{X})}{\mathrm{d}\mathbf{X}}$, where $\mathbf{X}$ is a matrix of dimension $(m, n)$; and $h(\mathbf{X})$ is a scalar-valued function.
Or should it be a row-vector, because according to the numerator-layout the derivative has the dimensions --> $\text{numerator-dimension} \times (\text{denominator-dimension})^\intercal = (1,1)\times(m, 1)$ (Source)
- Also, (for this point) is my understanding even correct?

PS: also, is there any definitive guide from which I can learn matrix-calculus from the first principals. Although, the following sources are good, they still leave a lot of gaps:

Matrix-Calculus
The Matrix-cookbook
Old and New Matrix Algebra Useful for Statistics, by T. P. Minka
The Matrix Calculus You Need For Deep Learning
Matrix Differentiation
Matrix Calculus
The Guide for Matrix Calculus, by R.P. Pacelli.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-12-20 04:12:32

$ \def\qiq{\quad\implies\quad} \def\trace#1{\operatorname{Tr}(#1)} \def\shape#1{\operatorname{shape}(#1)} \def\LR#1{\left(#1\right)} \def\p{\partial}\def\o{\tt1}\def\z{\zeta} \def\grad#1#2{\frac{\partial #1}{\partial #2}} $Let's use a convention where uppercase Latin denotes a matrix, lowercase Latin a vector, and Greek letters are scalars. Always write the column vector as $y$ and the corresponding row vector as $y^T$. Finally, let's use a colon to denote the matrix inner product, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}\;=\;\trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ When applied to column $\big(n=\o\big)$ or row $\big(m=\o\big)$ vectors this corresponds to the usual dot product.

Now write the functions using column vectors, and calculate their Jacobians, gradients and differentials as
$$\eqalign{ y &= y(x) \qiq J=\grad{y}{x} &\qiq dy = J\,dx \\ \z &= \z(y) \qiq g = \grad{\z}{y} &\qiq d\zeta = g:dy \\ }$$ Now calculate the gradient of $\z$ with respect to $x$ by back substitution $$\eqalign{ d\z &= g:dy = g:J\,dx = J^Tg:dx \\ \grad{\z}{x} &= J^Tg = \LR{\grad{y}{x}}^T\LR{\grad{\z}{y}} \\ }$$ Next, consider the case of a matrix argument $Y$ $$\eqalign{ \z &= \z(Y) \qiq G = \grad{\z}{Y} &\qiq d\z = G:dY \\ }$$ In general, the shape of the gradient of a scalar-valued function should match the shape of the independent variable, e.g. $\shape{g}=\shape{y}\;$ and $\;\shape{G}=\shape{Y}$.

The shape of the Jacobian of a vector-value function is such that it can be dotted with the independent vector, i.e. such that $\,J\,dx\;$ is dimensionally compatible.

To make the formulas work with row vectors, simply transpose everything. $$\eqalign{ y^T &= y^T(x^T) &\qiq dy^T = dx^TJ^T \\ \z &= \z(y^T) &\qiq g^T = \grad{\z}{y^T} \\ \\ d\z &= dy^T:g^T \\&= dx^TJ^T:g^T \\&= dx^T:g^TJ \\ \grad{\z}{x^T} &= g^TJ \\ \\ }$$

Understanding numerator/denominator layout in matrix-calculus

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in VECTOR-ANALYSIS

Related Questions in MATRIX-CALCULUS

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions