$\frac{\delta Ax}{\delta x} = A $ and $\frac{\delta x^TA}{\delta x} = A^T $? it's both $A$.

48 Views Asked by At

In matrix calculus, it's given that $\frac{\delta Ax}{\delta x} = A $ and $\frac{\delta x^TA}{\delta x} = A^T $.

(A : $n\times n , x:n\times 1$)

How ever, I can't agree with "$\frac{\delta x^TA}{\delta x} = A^T $". It seems to be $A$, just like the other.

Here is why.

$$x^T = \begin{matrix} (x_1 & x_2...x_n) \end{matrix}$$ $$A = \begin{pmatrix} a_{11} & a_{12}&...&a_{1n}\\ a_{21} & a_{22}&...&a_{2n}\\ \vdots\\ a_{n1}&a_{n2}&...&a_{nn}\end{pmatrix}$$ $$x^TA = \begin{pmatrix}(x_1a_{11}+x_2a_{21}+...+x_na_{n1})&(x_1a_{12}+x_2a_{22}+...+x_na_{n2})&...&(x_1a_{1n}+x_2a_{2n}+...+x_na_{nn})\end{pmatrix}$$

So numerator $x^TA$ is a column vector and the denominator $x$ is a row vector. According to a numerator layout, we get $$\frac{\delta x^TA}{\delta x} = \begin{pmatrix}\frac{\delta(x_1a_{11}+x_2a_{21}+...+x_na_{n1})}{\delta x_1}&\frac{\delta(x_1a_{12}+x_2a_{22}+...+x_na_{n2})}{\delta x_1}&...&\frac{\delta (x_1a_{1n}+x_2a_{2n}+...+x_na_{nn})}{\delta x_1}\\ \frac{\delta(x_1a_{11}+x_2a_{21}+...+x_na_{n1})}{\delta x_2}&\frac{\delta(x_1a_{12}+x_2a_{22}+...+x_na_{n2})}{\delta x_2}&...&\frac{\delta (x_1a_{1n}+x_2a_{2n}+...+x_na_{nn})}{\delta x_2}\\ \vdots\\ \frac{\delta(x_1a_{11}+x_2a_{21}+...+x_na_{n1})}{\delta x_n}&\frac{\delta(x_1a_{12}+x_2a_{22}+...+x_na_{n2})}{\delta x_n}&...&\frac{\delta (x_1a_{1n}+x_2a_{2n}+...+x_na_{nn})}{\delta x_n}\end{pmatrix}$$ And as you see, it becomes $A$.

I can't find what's wrong with my deriving.

1

There are 1 best solutions below

1
On BEST ANSWER

First issue to point out that should help sanity check you:

$\frac{d\vec{a}}{d\vec{b}}$ at row $i$, column $j$ is $\frac{da_i}{db_j}$. This means each column of your derivative matrix should be with respect to the same $x_i$.

Now, the reason why the answer is $A^T$:

I think this is just a notation issue, but people can feel free (and will) correct me on here. Look at page 8 of the Matrix Cookbook at the vector form definition below the matrix definition (32). This doesn't specify anything about whether the vector in the numerator or denominator is a row or column vector. By this construction, if the $i^{th}$ element of $x$ is the same regardless of if it is a row or column.

So, all this implies:

$\frac{d\vec{y}}{d\vec{x}} = \frac{d\vec{y}^T}{d\vec{x}}$ for the case that y is a vector.

This probably comes from the fact that we can view this as really taking the derivate of functions, so $\vec{y}$ isn't really a vector but a function $f : F^m \rightarrow F^n$, with input $x_1, ..., x_m$ and output $y_1, ... y_n$. Like the vector output orientation is all arbitrary. We could easily say flip the indices of the output matrix such that at row $i$ column $j$ is $\frac{dA_j}{dB_i}$ and it wouldn't break math, instead just requiring some reorientation of how we combine terms.

But TL;DR, since $\frac{d\vec{y}}{d\vec{x}} = \frac{d\vec{y}^T}{d\vec{x}}$, $\frac{d\vec{x}^TA}{d\vec{x}} = \frac{dA^T\vec{x}}{d\vec{x}} = A^T$