A question about notation in matrix calculus: $\dfrac{\partial Ax}{\partial x}=A^T$ or $ \dfrac{\partial (Ax)^T}{\partial x}=A^T$?

Question

A question about notation in matrix calculus: $\dfrac{\partial Ax}{\partial x}=A^T$ or $ \dfrac{\partial (Ax)^T}{\partial x}=A^T$?

114 Views Asked by Bumbble Comm At 05 Apr 2026 - 2:34

I was looking for an explanation of the derivation of quadratic forms in matrix calculus, and during my search I came across two different, seemingly contradictory, identities.

On the one hand, some derivation used $\dfrac{\partial \mathbf{Ax}}{\partial\mathbf{x}}=\mathbf{A^T}$, which implies that $\dfrac{\partial \mathbf{x}}{\partial\mathbf{x}}=\mathbf{I}$.

On the other hand, my book on matrix algebra for example states that $ \dfrac{\partial \mathbf{(Ax)^T}}{\partial\mathbf{x}}=\dfrac{\partial \mathbf{x^TA^T}}{\partial\mathbf{x}}=\mathbf{A^T}$, which is quite the opposite of what I found elsewhere.

I can follow the derivation in both cases, but I would like to understand the deeper context, and why there are different approaches? Which of the two notations is the most common?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 16 Aug 2021 - 6:05

Define $F \colon \mathbb{R}^n \to \mathbb{R}^n$ by $F(x) = Ax$. The derivative of $F$ at $x$, denoted $DF(x) \colon \mathbb{R}^n \to \mathbb{R}^n$, is simply $DF(x) = A$. Naturally, $DF(x)$ is represented by the matrix $A$.

Define $G \colon \mathbb{R}^n \to M(1 \times n, \mathbb{R})$ by $G(x) = (Ax)^T$. Since $G$ is linear, the derivative of $G$ at $x$ is given by $DG(x)y = G(y) = (Ay)^T$. $DG(x)$ can be represented by a matrix by choosing bases for $\mathbb{R}^n$ and $M(1 \times n, \mathbb{R})$. For example, if you use $e_1, \dots, e_n$ as a basis for $\mathbb{R}^n$ and $e_1^T, \dots, e_n^T$ as a basis for $M(1 \times n, \mathbb{R})$, then the matrix representation of $DG(x)$ with respect to these bases is $A$.

**Bumbble Comm** · Accepted Answer

Given $\mathbf{x}= \begin{pmatrix} x_1\\ x_2\\ \vdots\\ x_n\\ \end{pmatrix}\in\mathbb{R^n}$ and $\mathbf{y}= \begin{pmatrix} y_1\left(x_1,\ldots,x_n\right)\\ y_2\left(x_1,\ldots,x_n\right)\\ \vdots\\ y_m\left(x_1,\ldots,x_n\right)\\ \end{pmatrix}\in\mathbb{R}^m $ we consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}\tag{1} \end{align*}

Given an $(m\times n)$-matrix $A=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}$ we obtain using the numerator layout notation (1) \begin{align*} \color{blue}{\frac{\partial \mathbf{Ax}}{\partial \mathbb{x}}} &=\frac{\partial\left(\sum_{k=1}^na_{1,k}x_k,\ldots,\sum_{k=1}^na_{m,k}x_k\right)}{\partial\left(x_1,\ldots,x_n\right)}\\ &=\left(\frac{\partial\left(\sum_{k=1}^na_{i,k}x_k\right)}{\partial x_j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\\ &=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\\ &\,\,\color{blue}{=\mathbf{A}} \end{align*} which corresponds to OPs second formula $ \dfrac{\partial \mathbf{(Ax)^T}}{\partial\mathbf{x}}=\dfrac{\partial \mathbf{x^TA^T}}{\partial\mathbf{x}}=\mathbf{A^T}$.

On the other hand we can also consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\in\mathbb{R}^{n\times m}\tag{2} \end{align*}

Given an $(m\times n)$-matrix $A=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}$ we obtain using the denominator layout notation (2) \begin{align*} \color{blue}{\frac{\partial \mathbf{Ax}}{\partial \mathbb{x}}} &=\frac{\partial\left(\sum_{k=1}^na_{1,k}x_k,\ldots,\sum_{k=1}^na_{m,k}x_k\right)}{\partial\left(x_1,\ldots,x_n\right)}\\ &=\left(\frac{\partial\left(\sum_{k=1}^na_{i,k}x_k\right)}{\partial x_j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\\ &=\left(a_{i,j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\\ &\,\,\color{blue}{=\mathbf{A}^T} \end{align*} which corresponds to OPs first formula $\dfrac{\partial \mathbf{Ax}}{\partial\mathbf{x}}=\mathbf{A^T}$.

Note: I've skimmed through some of my Analysis books and found the numerator layout notation only. This might indicate that in the context of analysis it is more often convenient than the denominator notation. It was also the notation I've learned during my math studies (Analysis from Harro Heuser).

Numerator layout notation:
- Foundations of Modern Analysis, Vol. 1 by J. Dieudonné
- Introduction to Calculus and Analysis II by R. Courant
- Principles of Mathematical Analysis by W. Rudin
- Calculus of Serveral Variables by S. Lang
- Lehrbuch der Analysis, Teil 2 by H. Heuser.

A question about notation in matrix calculus: $\dfrac{\partial Ax}{\partial x}=A^T$ or $ \dfrac{\partial (Ax)^T}{\partial x}=A^T$?

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in QUADRATIC-FORMS

Trending Questions

Popular # Hahtags

Popular Questions