Derive $\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = x^TA^T+x^TA$

476 Views Asked by At

I see here: Vector derivative w.r.t its transpose $\frac{d(Ax)}{d(x^T)}$ that that which is stated in the title is true. However, I tried deriving it myself.

$U = \mathbf{x}^T\mathbf{A}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \frac{\partial U}{\partial\mathbf{x}} \mathbf{x} + U\frac{\partial\mathbf{x}}{\partial\mathbf{x}}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^T\mathbf{A}}{\partial\mathbf{x}} \mathbf{x} + \mathbf{x}^T\mathbf{A}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \mathbf{A}^T\mathbf{x} + \mathbf{x}^T\mathbf{A} = 2\mathbf{x}^T\mathbf{A} \neq x^TA^T+x^TA$

Am I applying product rule correctly?

3

There are 3 best solutions below

4
On BEST ANSWER

Let $$f(x)=x^t A x$$ Then we have that $$f(x+h)=(x+h)^t A (x+h)=x^tAx+x^tAh+h^tAx+h^tAh$$ I.e. $$f(x+h)-f(x)=x^tAh+h^tAx+h^tAh=x^tAh+x^tA^th+h^tAh=(x^tA+x^tA^t)h+h^tAh$$ Can you continue?

1
On

Explicit indices help:$$\begin{align}\frac{\partial(x^TAx)}{\partial x_i}&=\partial_i(x_jA_{jk}x_k)\\&=\delta_{ij}A_{jk}x_k+x_jA_{jk}\delta_{ik}\\&=A_{ik}x_k+x_jA_{ji}\\&=(Ax+A^Tx)_i,\end{align}$$so the derivative you sought is $(A+A^T)x$ or the transpose, $x^T(A+A^T)$, depending on how you define it. The second option results from $df=\frac{df}{dx}dx$.

1
On

Maybe changing notation helps? I'll do it in a different way since you already got two good answers. You're looking for the gradient of the function $f\colon \Bbb R^n \to \Bbb R$ given by $f(x) = \langle Ax,x\rangle$, where $A\colon \Bbb R^n\to \Bbb R^n$ is linear. For every bilinear map $B$ we have that $$DB(x,y)(h,k) = B(x,h)+B(h,k),$$and $f = B \circ \Delta$, where $B(x,y) = \langle Ax,y\rangle$ is bilinear and $\Delta(x) = (x,x)$ is the (linear) diagonal embedding. So the chain rule kicks in and we have that $$\begin{align} Df(x)(h) &= D(B\circ \Delta)(x)(h) = DB(\Delta(x)) \circ D\Delta(h) \\ &= DB(x,x)(h,h) = B(x,h)+B(h,x) \\ &= \langle Ax,h\rangle + \langle Ah,x\rangle = \langle Ax,h\rangle + \langle h, A^\top x\rangle \\ &= \langle Ax+A^\top x, h\rangle. \end{align}$$This means that $\nabla f(x) = Ax+A^\top x$, as wanted.