Derivative of quadratic form for matrices and vectors

533 Views Asked by At

I'm not very familiar with multivariable calculus as it relates to matrices. Could someone explain, in detail, why $$\frac{\partial}{\partial x} \left[ x^T A x \right] = (A + A^T)x$$ In the case of a symmetric matrix and $$\frac{\partial}{\partial x} \left[ x^T A x \right] = 2Ax$$ if the matrix is not symmetric. I'm mainly confused about how we even arrive at the first derivative. However, I understand how the first derivative simplifies to the second in the case that A is symmetric.

4

There are 4 best solutions below

0
On

We have $$ \begin{split} \frac{\partial}{\partial x} \left[ x^T A x \right]v &= \lim_{h\to0}\frac{(x+hv)^T A (x+hv)-x^T A x}{h} \\ &= \lim_{h\to0}\frac{x^T A hv+(hv)^T A x+(hv)^T A hv}{h} \\ &= x^T A v+v^T A x \\ &= x^T A v+x^T A^T v \\ &= x^T(A + A^T)v \end{split} $$ and so $$ \frac{\partial}{\partial x} \left[ x^T A x \right]= x^T(A + A^T). $$ You can see the row vector $x^T(A + A^T)$ as the column vector $(x^T(A + A^T))^T=(A + A^T)x$, but strictly speaking it is not the same.

0
On

Some facts and notations:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= B^T A : C \\ &= {\text{etc.}} \cr \end{align}

Let $f := x^T A x = x:Ax$.

Compute the differential first, and then the gradient can be obtained from it. \begin{align} df &= dx:Ax + x: A dx \\ &= Ax:dx + A^Tx:dx \\ &= (A + A^T)x:dx \end{align}

Thus, the gradient is \begin{align} \frac{\partial }{\partial x} \left( x^T Ax \right)= (A + A^T)x. \end{align}

When $A$ is symmetric, i.e., $A^T = A$, then the gradient is $\frac{\partial }{\partial x} \left( x^T Ax \right)= 2Ax$.

0
On

An alternative approach, though similar: we have

$$\begin{align}f(x+h)&=(x+h)^TA(x+h)\\ &=x^TAx+x^TAh+h^TAx+h^TAh\\ &=x^TAx+x^T(A+A^T)h+h^TAh\\ &=f(x)+\mathrm Df(x)h+o(\vert h\vert), \end{align}$$

where $\mathrm Df(x):h\mapsto x^T(A+A^T)h$ is linear and $h^TAh\in o(\vert h\vert)$. And thus the linear map $\mathrm Df(x)$ is the derivative of $f$ at $x$. The version you were given is the transpose of its matrix representation. If $A$ is symmetric, $A+A^T=2A$.

0
On

Suppose $x=(x_1,\ldots,x_n)^T$ and $A=(a_{ij})$, by calculating partial derivative w.r.t the $k^{th}$ component, we have $$ \frac {\partial x^T A x}{\partial x_k} $$ $$ = \frac {\partial (\sum_{ij} x_i a_{ij}x_j)}{\partial x_k}$$ $$= \sum_j a_{kj}x_j +\sum_i x_i a_{ik} $$ $$ =\sum_j (a_{kj} + a_{jk})x_j$$ $$ =[(A+A^T)x]_k$$ Hence, $\frac {\partial x^T A x}{\partial x}=(A+A^T)x$.

The case when $A$ is not symmetric can be understand with tensor language:

If $A$ is not symmetric, $a_{ij}$ is not equal to $a_{ji}$ in general. In the expression $x_i a_{ij} x_j$, we need to differentiate $x_i$, $x_j$ respectively. This gives $a_{kj}x_j$ and $x_i a_{ik}$ in the above. Notice that $j$ is dummy index, which can be substituted by any symbol, we exchange $i$ with $j$ and combine the two items. It yields $(a_{jk} + a_{kj})x_j$, where both left and right index of $a_{ij}$ is cancel exactly once. So that when $A$ is not symmetric, $a_{jk}+a_{kj}$ is not in general $2a_{jk}$, which is where the difference takes place.