A question about notation in matrix calculus: $\dfrac{\partial Ax}{\partial x}=A^T$ or $ \dfrac{\partial (Ax)^T}{\partial x}=A^T$?

114 Views Asked by At

I was looking for an explanation of the derivation of quadratic forms in matrix calculus, and during my search I came across two different, seemingly contradictory, identities.

On the one hand, some derivation used $\dfrac{\partial \mathbf{Ax}}{\partial\mathbf{x}}=\mathbf{A^T}$, which implies that $\dfrac{\partial \mathbf{x}}{\partial\mathbf{x}}=\mathbf{I}$.

On the other hand, my book on matrix algebra for example states that $ \dfrac{\partial \mathbf{(Ax)^T}}{\partial\mathbf{x}}=\dfrac{\partial \mathbf{x^TA^T}}{\partial\mathbf{x}}=\mathbf{A^T}$, which is quite the opposite of what I found elsewhere.

I can follow the derivation in both cases, but I would like to understand the deeper context, and why there are different approaches? Which of the two notations is the most common?

2

There are 2 best solutions below

0
On BEST ANSWER

Given $\mathbf{x}= \begin{pmatrix} x_1\\ x_2\\ \vdots\\ x_n\\ \end{pmatrix}\in\mathbb{R^n}$ and $\mathbf{y}= \begin{pmatrix} y_1\left(x_1,\ldots,x_n\right)\\ y_2\left(x_1,\ldots,x_n\right)\\ \vdots\\ y_m\left(x_1,\ldots,x_n\right)\\ \end{pmatrix}\in\mathbb{R}^m $ we consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}\tag{1} \end{align*}

Given an $(m\times n)$-matrix $A=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}$ we obtain using the numerator layout notation (1) \begin{align*} \color{blue}{\frac{\partial \mathbf{Ax}}{\partial \mathbb{x}}} &=\frac{\partial\left(\sum_{k=1}^na_{1,k}x_k,\ldots,\sum_{k=1}^na_{m,k}x_k\right)}{\partial\left(x_1,\ldots,x_n\right)}\\ &=\left(\frac{\partial\left(\sum_{k=1}^na_{i,k}x_k\right)}{\partial x_j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\\ &=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\\ &\,\,\color{blue}{=\mathbf{A}} \end{align*} which corresponds to OPs second formula $ \dfrac{\partial \mathbf{(Ax)^T}}{\partial\mathbf{x}}=\dfrac{\partial \mathbf{x^TA^T}}{\partial\mathbf{x}}=\mathbf{A^T}$.

On the other hand we can also consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\in\mathbb{R}^{n\times m}\tag{2} \end{align*}

Given an $(m\times n)$-matrix $A=\left(a_{i,j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}$ we obtain using the denominator layout notation (2) \begin{align*} \color{blue}{\frac{\partial \mathbf{Ax}}{\partial \mathbb{x}}} &=\frac{\partial\left(\sum_{k=1}^na_{1,k}x_k,\ldots,\sum_{k=1}^na_{m,k}x_k\right)}{\partial\left(x_1,\ldots,x_n\right)}\\ &=\left(\frac{\partial\left(\sum_{k=1}^na_{i,k}x_k\right)}{\partial x_j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\\ &=\left(a_{i,j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\\ &\,\,\color{blue}{=\mathbf{A}^T} \end{align*} which corresponds to OPs first formula $\dfrac{\partial \mathbf{Ax}}{\partial\mathbf{x}}=\mathbf{A^T}$.

Note: I've skimmed through some of my Analysis books and found the numerator layout notation only. This might indicate that in the context of analysis it is more often convenient than the denominator notation. It was also the notation I've learned during my math studies (Analysis from Harro Heuser).

0
On

Define $F \colon \mathbb{R}^n \to \mathbb{R}^n$ by $F(x) = Ax$. The derivative of $F$ at $x$, denoted $DF(x) \colon \mathbb{R}^n \to \mathbb{R}^n$, is simply $DF(x) = A$. Naturally, $DF(x)$ is represented by the matrix $A$.

Define $G \colon \mathbb{R}^n \to M(1 \times n, \mathbb{R})$ by $G(x) = (Ax)^T$. Since $G$ is linear, the derivative of $G$ at $x$ is given by $DG(x)y = G(y) = (Ay)^T$. $DG(x)$ can be represented by a matrix by choosing bases for $\mathbb{R}^n$ and $M(1 \times n, \mathbb{R})$. For example, if you use $e_1, \dots, e_n$ as a basis for $\mathbb{R}^n$ and $e_1^T, \dots, e_n^T$ as a basis for $M(1 \times n, \mathbb{R})$, then the matrix representation of $DG(x)$ with respect to these bases is $A$.