Can you use the chain rule in vector calculus to compute the gradient of a matrix?

254 Views Asked by At

From the definition of Jacobian I previously determined that the gradient of $x^TA$ with respect to $x$ for $x \in \mathbb{R}^m, A \in \mathbb{R}^{mxm}$ is equal to $A^T$

However, I want to now determine the gradient of $x^TAx.$ From my single variable calculus I remember both the product rule and the chain rule, thus I was trying to apply the same concepts here given that I know what the value for the gradient of $x^TA$ is.

$\frac{\partial x^TAx}{\partial x} = \frac{\partial (x^TA)(x)}{\partial x} = \frac{\partial(x^TA)}{\partial x}(x) + (x^TA)\frac{\partial(x)}{\partial x} = A^Tx + x^TA$.

However, I am clearly not understanding it correctly.

The question is, can you somehow make use of the previously known information here in order to simplify the derivation of this expression?

Or am I supposed to approach it differently?

2

There are 2 best solutions below

0
On BEST ANSWER

Consider a scalar function $(\phi)$ of two vectors $(x,y)$ $$\eqalign{ \phi &= x^TAy = y^TA^Tx \cr }$$ Its differential is $$\eqalign{ d\phi &= x^TA\,dy + y^TA^T\,dx \cr }$$ Now consider what happens in the case that $(y=x),$ so there is now a single vector argument $$\eqalign{ d\phi &= x^T(A+A^T)\,dx \cr\cr }$$ Depending on which "layout convention" you prefer, the gradient will be either $$\eqalign{\frac{\partial\phi}{\partial x} &= x^T(A+A^T)}$$ or $$\eqalign{\frac{\partial\phi}{\partial x} &= (A+A^T)x}$$

0
On

The gradient of $x^TAx$ can be determinated as follow.

Let

$$f(x)=x^TAx$$

thus

$$f(x_0+h)=(x_0+h)^TA(x_0+h)=\langle A(x_0+h),x_0+h\rangle =\langle Ax_0+Ah,x_0+h\rangle=$$

$$\langle Ax_0,x_0\rangle+\langle Ax_0,h\rangle+\langle Ah,x_0\rangle+\langle Ah,h\rangle$$

and note that

  • $\langle Ax_0,x_0\rangle=x_0^TAx_0=f(x_0)$
  • $\langle Ah,h\rangle=h^TAh=\frac12h^TH_f(x_0)h$

and

$$\langle Ax_0,h\rangle+\langle Ah,x_0>=\langle Ax_0,h\rangle+\langle A^Tx_0,h\rangle=\langle (A+A^T)x_0,h\rangle=$$

$$=\langle \nabla f(x_0),h\rangle$$

thus

$$\nabla f(x_0)=(A+A^T)x_0$$