I was looking for an explanation of the derivation of quadratic forms in matrix calculus, and during my search I came across two different, seemingly contradictory, identities.
On the one hand, some derivation used $\dfrac{\partial \mathbf{Ax}}{\partial\mathbf{x}}=\mathbf{A^T}$, which implies that $\dfrac{\partial \mathbf{x}}{\partial\mathbf{x}}=\mathbf{I}$.
On the other hand, my book on matrix algebra for example states that $ \dfrac{\partial \mathbf{(Ax)^T}}{\partial\mathbf{x}}=\dfrac{\partial \mathbf{x^TA^T}}{\partial\mathbf{x}}=\mathbf{A^T}$, which is quite the opposite of what I found elsewhere.
I can follow the derivation in both cases, but I would like to understand the deeper context, and why there are different approaches? Which of the two notations is the most common?
Given $\mathbf{x}= \begin{pmatrix} x_1\\ x_2\\ \vdots\\ x_n\\ \end{pmatrix}\in\mathbb{R^n}$ and $\mathbf{y}= \begin{pmatrix} y_1\left(x_1,\ldots,x_n\right)\\ y_2\left(x_1,\ldots,x_n\right)\\ \vdots\\ y_m\left(x_1,\ldots,x_n\right)\\ \end{pmatrix}\in\mathbb{R}^m $ we consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq i\leq m}\atop{1\leq j\leq n}}\in\mathbb{R}^{m\times n}\tag{1} \end{align*}
On the other hand we can also consider the Jacobian matrix $\mathbf{J}$ in the form \begin{align*} \mathbf{J}=\frac{\partial\left(y_1,\ldots,y_m\right)}{\partial\left(x_1,\ldots,x_n\right)} =\left(\frac{\partial y_i}{\partial x_j}\right)_{{1\leq j\leq n}\atop{1\leq i\leq m}}\in\mathbb{R}^{n\times m}\tag{2} \end{align*}
Note: I've skimmed through some of my Analysis books and found the numerator layout notation only. This might indicate that in the context of analysis it is more often convenient than the denominator notation. It was also the notation I've learned during my math studies (Analysis from Harro Heuser).
Numerator layout notation:
Foundations of Modern Analysis, Vol. 1 by J. Dieudonné
Introduction to Calculus and Analysis II by R. Courant
Principles of Mathematical Analysis by W. Rudin
Calculus of Serveral Variables by S. Lang
Lehrbuch der Analysis, Teil 2 by H. Heuser.