Why $\frac{dx^TA}{dx}=A^T$ instead of $A$

95 Views Asked by At

This is a really basic question, but I have spent one hour on it.

Suppose $A$ is a matrix and each entry of $A$ is not a function $x$, where $x$ is a vector. Then, $$\frac{dx^TA}{dx}=A^T.$$

Below is my try to prove it:

$$\frac{dx^TA}{dx}=\frac{d(A^Tx)^T}{dx}=\left(\frac{dA^Tx}{dx}\right)^T=(A^T)^T=A.$$

Frankly, I just do not understand why it's $A^T$ instead of $A$ in the original equation.

1

There are 1 best solutions below

0
On BEST ANSWER

The difference between numerator and denominator layout is due to the fact that a linear transformation $L:\mathbb{R}^n\to\mathbb{R}^m$ is represented in different ways if the vectors of the two vector spaces are rows or columns vectors.

If $A_L$ is the matrix that represents the transformation as $ A_L:\mathbb{R}^{n\times 1}\to\mathbb{R}^{m\times 1}$ than the matrix that represents the ''corresponding'' transformation from $\mathbb{R}^{1\times n}\to\mathbb{R}^{1\times m}$ (the same transformation if we think at $\mathbb{R}^n\to\mathbb{R}^m$) is the transpose $A_L^T$.

In your case the transformation represented by $\vec y=\vec x^T A$ can be thinked as a transformation from $\mathbb{R}^{n\times 1} \to \mathbb{R}^{1\times m}$.

Using an example with low dimension ( $n=3\;,\; m=2$) to avoid complication in the indices, we can think to a transformation from $\mathbb{R}^{3\times1}\to\mathbb{R}^{1\times 2}$ that can be represented in the form

$$ \vec y=[y_1\quad y_2]=\vec x^T A= \begin{bmatrix}x_1\\x_2\\x_3 \end{bmatrix}^T \begin{bmatrix} A_{11}&A_{21}\\ A_{12}&A_{22}\\ A_{13}&A_{23} \end{bmatrix}= \begin{bmatrix}x_1&x_2&x_3 \end{bmatrix} \begin{bmatrix} A_{11}&A_{21}\\ A_{12}&A_{22}\\ A_{13}&A_{23} \end{bmatrix}= $$ $$ =\begin{bmatrix}(A_{11}x_1+A_{12}x_2+A_{13}x_3)& (A_{21}x_1+A_{22}x_2+A_{23}x_3) \end{bmatrix} $$ now,if we want that the derivative is a matrix that operate on a column vector of $\mathbb{R}^{3\times1}$, the derivative must be: $$ \frac{d \vec y}{d \vec x}=\frac{d (\vec x^T A)}{d \vec x}= \begin{bmatrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_1}{\partial x_2}& \frac{\partial y_1}{\partial x_3}\\ \frac{\partial y_2}{\partial x_1}& \frac{\partial y_2}{\partial x_2}& \frac{\partial y_2}{\partial x_3} \end{bmatrix}= \begin{bmatrix} A_{11}&A_{12}&A_{13}\\ A_{21}&A_{22}&A_{23} \end{bmatrix}= A^T $$