Calculate the gradient of a linear scalar field

280 Views Asked by At

I am trying to calculate the following gradient

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right)$$

where I am using the convention that $\mathbf{a}$ is a column vector. I am wondering what the steps are to extract the solution from the matrix cookbook, which is:

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right) = \mathbf{a}\cdot\mathbf{a}^{T}$$

2

There are 2 best solutions below

12
On BEST ANSWER

See this question for the basics and the notation.

The derivative of the scalar function $f(X)$ with respect to $X$, where $X$ is a matrix, is the matrix $A$ with $A_{i,j}=\dfrac{df(X)}{dX_{i,j}}$.

And here,

$$f(X)=a^TXa=\sum_{i,j} X_{i,j}a_ia_j$$

So that

$$\dfrac{df(X)}{dX_{i,j}}=a_ia_j$$

And finally

$$A=\frac{df(X)}{dX}=aa^T$$

0
On

$$\begin{array}{l|rcl} f : & M_n(\mathbb R) & \longrightarrow & \mathbb R\\ & X & \longmapsto & a^T X a \end{array}$$

is a linear map.

Critical is to understand what the domain and codomain of $f$ are in order to understand what $f$ is as a function.

Hence its Fréchet derivative at each point is equal to itself: $f^\prime(X).u =a^T u a$.

Following a detailed and interesting discussion with Jean-Claude Arbaut (see the comments!), we can rewrite

$$f^\prime(X).u =a^T u a = \mathrm{tr}(a^T u a) = \mathrm{tr}(u \cdot (a \cdot a^T))= \mathrm{tr}((a \cdot a^T) \cdot u) = \mathrm{tr}(A \cdot u)$$

where $A = a \cdot a^T$ is defined as the matrix calculus derivative of $f$ with respect to $X$. This is in fact what is meant by

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right) = \frac{\partial\left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right)}{\partial \mathbf{X}}=\mathbf{a}\cdot\mathbf{a}^{T}$$ in the Matrix Cookbook.