Vector derivative w.r.t its transpose $\frac{d(Ax)}{d(x^T)}$

140.1k Views Asked by At

Given a matrix $A$ and column vector $x$, what is the derivative of $Ax$ with respect to $x^T$ i.e. $\frac{d(Ax)}{d(x^T)}$, where $x^T$ is the transpose of $x$?

Side note - my goal is to get the known derivative formula $\frac{d(x^TAx)}{dx} = x^T(A^T + A)$ from the above rule and the chain rule.

3

There are 3 best solutions below

0
On BEST ANSWER

Let $f(x) = x^TAx$ and you want to evaluate $\frac{df(x)}{dx}$. This is nothing but the gradient of $f(x)$.

There are two ways to represent the gradient one as a row vector or as a column vector. From what you have written, your representation of the gradient is as a row vector.

First make sure to get the dimensions of all the vectors and matrices in place.

Here $x \in \mathbb{R}^{n \times 1}$, $A \in \mathbb{R}^{n \times n}$ and $f(x) \in \mathbb{R}$

This will help you to make sure that your arithmetic operations are performed on vectors of appropriate dimensions.

Now lets move on to the differentiation.

All you need to know are the following rules for vector differentiation.

$$\frac{d(x^Ta)}{dx} = \frac{d(a^Tx)}{dx} = a^T$$ where $x,a \in \mathbb{R}^{n \times 1}$.

Note that $x^Ta = a^Tx$ since it is a scalar and the equation above can be derived easily.

(Some people follow a different convention i.e. treating the derivative as a column vector instead of a row vector. Make sure to stick to your convention and you will end up with the same conclusion in the end)

Make use of the above results to get,

$$\frac{d(x^TAx)}{dx} = x^T A^T + x^T A$$ Use product rule to get the above result i.e. first take $Ax$ as constant and then take $x^T A$ as constant.

So, $$\frac{df(x)}{dx} = x^T(A^T + A)$$

12
On

I think there is no such thing. $\mbox{d}(x^\mbox{T}Ax)/\mbox{d}x$ is something that, when multiplied by the change $\mbox{d}x$ in $x$, yields the change $\mbox{d}(x^\mbox{T}Ax)$ in $x^\mbox{T}Ax$. Such a thing exists and is given by the formula you quote. $\mbox{d}(Ax)/\mbox{d}(x^\mbox{T})$ would have to be something that, when multiplied by the change $\mbox{d}x^\mbox{T}$ in $x^\mbox{T}$, yields the change $\mbox{d}Ax$ in $Ax$. No such thing exists, since $x^\mbox{T}$ is a $1 \times n$ row vector and $Ax$ is an $n \times 1$ column vector.

If your main goal is to derive the derivative formula, here's a derivation:

$(x^\mbox{T} + \mbox{d}x^\mbox{T})A(x + \mbox{d}x) = x^\mbox{T}Ax + \mbox{d}x^\mbox{T}Ax + x^\mbox{T}A\mbox{d}x + \mbox{d}x^\mbox{T}A\mbox{d}x =$

$=x^\mbox{T}Ax + x^\mbox{T}A^\mbox{T}\mbox{d}x + x^\mbox{T}A\mbox{d}x + O (\lVert \mbox{d}x \rVert^2) = x^\mbox{T}Ax + x^\mbox{T}(A^\mbox{T} + A)\mbox{d}x + O (\lVert \mbox{d}x \rVert^2)$

1
On

As Sivaram points out, you must define your convention about rows/colums derivatives and just be consistent.

For example, you could define the derivative of a column vector with respect to a row vector as (assuming the letters represent column vectors) as matrix:

$\displaystyle \frac{d(y)}{dx^T} = D$ with $d_{i,j} = \frac{d(y_i)}{dx^j}$

And that will work (it will be consistent). For example, you get $\displaystyle \frac{d(Ax)}{dx^T} = A$

But it's not so simple to apply this -and the product rule of derivation- to deduce your identity, because you get to different derivatives: a row with respect to a row and a column respect to row, and you can't (at least without further justification) mix them.

Of course, if the matrix is simmetric all is simpler.