Derivative of quadratic form for matrices and vectors

Question

Derivative of quadratic form for matrices and vectors

533 Views Asked by Bumbble Comm At 08 Apr 2026 - 4:00

I'm not very familiar with multivariable calculus as it relates to matrices. Could someone explain, in detail, why $$\frac{\partial}{\partial x} \left[ x^T A x \right] = (A + A^T)x$$ In the case of a symmetric matrix and $$\frac{\partial}{\partial x} \left[ x^T A x \right] = 2Ax$$ if the matrix is not symmetric. I'm mainly confused about how we even arrive at the first derivative. However, I understand how the first derivative simplifies to the second in the case that A is symmetric.

Original Q&A

There are 4 best solutions below

**Bumbble Comm** · Answer 1 · 2021-01-21 06:55:48

We have $$ \begin{split} \frac{\partial}{\partial x} \left[ x^T A x \right]v &= \lim_{h\to0}\frac{(x+hv)^T A (x+hv)-x^T A x}{h} \\ &= \lim_{h\to0}\frac{x^T A hv+(hv)^T A x+(hv)^T A hv}{h} \\ &= x^T A v+v^T A x \\ &= x^T A v+x^T A^T v \\ &= x^T(A + A^T)v \end{split} $$ and so $$ \frac{\partial}{\partial x} \left[ x^T A x \right]= x^T(A + A^T). $$ You can see the row vector $x^T(A + A^T)$ as the column vector $(x^T(A + A^T))^T=(A + A^T)x$, but strictly speaking it is not the same.

**Bumbble Comm** · Answer 2 · 2021-01-21 07:12:05

Some facts and notations:

Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= B^T A : C \\ &= {\text{etc.}} \cr \end{align}

Let $f := x^T A x = x:Ax$.

Compute the differential first, and then the gradient can be obtained from it. \begin{align} df &= dx:Ax + x: A dx \\ &= Ax:dx + A^Tx:dx \\ &= (A + A^T)x:dx \end{align}

Thus, the gradient is \begin{align} \frac{\partial }{\partial x} \left( x^T Ax \right)= (A + A^T)x. \end{align}

When $A$ is symmetric, i.e., $A^T = A$, then the gradient is $\frac{\partial }{\partial x} \left( x^T Ax \right)= 2Ax$.

**Bumbble Comm** · Answer 3 · 2021-01-21 07:16:29

An alternative approach, though similar: we have

$$\begin{align}f(x+h)&=(x+h)^TA(x+h)\\ &=x^TAx+x^TAh+h^TAx+h^TAh\\ &=x^TAx+x^T(A+A^T)h+h^TAh\\ &=f(x)+\mathrm Df(x)h+o(\vert h\vert), \end{align}$$

where $\mathrm Df(x):h\mapsto x^T(A+A^T)h$ is linear and $h^TAh\in o(\vert h\vert)$. And thus the linear map $\mathrm Df(x)$ is the derivative of $f$ at $x$. The version you were given is the transpose of its matrix representation. If $A$ is symmetric, $A+A^T=2A$.

**Bumbble Comm** · Answer 4 · 2021-01-21 07:27:29

Suppose $x=(x_1,\ldots,x_n)^T$ and $A=(a_{ij})$, by calculating partial derivative w.r.t the $k^{th}$ component, we have $$ \frac {\partial x^T A x}{\partial x_k} $$ $$ = \frac {\partial (\sum_{ij} x_i a_{ij}x_j)}{\partial x_k}$$ $$= \sum_j a_{kj}x_j +\sum_i x_i a_{ik} $$ $$ =\sum_j (a_{kj} + a_{jk})x_j$$ $$ =[(A+A^T)x]_k$$ Hence, $\frac {\partial x^T A x}{\partial x}=(A+A^T)x$.

The case when $A$ is not symmetric can be understand with tensor language:

If $A$ is not symmetric, $a_{ij}$ is not equal to $a_{ji}$ in general. In the expression $x_i a_{ij} x_j$, we need to differentiate $x_i$, $x_j$ respectively. This gives $a_{kj}x_j$ and $x_i a_{ik}$ in the above. Notice that $j$ is dummy index, which can be substituted by any symbol, we exchange $i$ with $j$ and combine the two items. It yields $(a_{jk} + a_{kj})x_j$, where both left and right index of $a_{ij}$ is cancel exactly once. So that when $A$ is not symmetric, $a_{jk}+a_{kj}$ is not in general $2a_{jk}$, which is where the difference takes place.

Derivative of quadratic form for matrices and vectors

There are 4 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MATRIX-CALCULUS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions