What is $\nabla f$ and $\nabla ^2 f$ of $$f(X)= b^TX^TXc,$$ where $X \in \mathbb{R}^{n \times n}$ and $b,c \in \mathbb{R}^n\,$?
How to find the gradient and the Hessian of $f(X) = b^TX^TXc\,$?
317 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Alternative approach (in particular for Hessian -- avoiding fourth order tensor)
Define the Frobenius product by a colon $:$ and utilize it's cyclic property \begin{align} {\rm Tr}\left( A^T B C D\right) &:= A: BCD \\ &= AD^T: BC \\ &= B^TA:CD \end{align}
Let \begin{align} f(X) := b^T X^T X c \equiv b: X^T X c. \end{align}
Now, we can use differentials and then obtain gradient. \begin{align} df &= Xc : dXb + Xb : dX c \\ &= Xcb^T : dX + Xbc^T : dX \end{align}
The gradient is \begin{align} G:= \frac{\partial f}{\partial X} = X c b^T + X b c^T. \end{align}
Now, to compute Hessian, we can vectorize the gradient $G$. Let $A:=(c b^T + b c^T)$, and $I$ be an identity matrix. \begin{align} g :=& \operatorname{vec}(G) = \operatorname{vec}\left(X A\right) \\ =& \operatorname{vec}\left(X I A\right) \\ =& \left( A^T \otimes I \right) \underbrace{\operatorname{vec}(X)}_{ := x}. \end{align}
Use differentials, \begin{align} dg &= \left( A^T \otimes I \right) dx, \end{align} such that the Hessian reads \begin{align} \frac{\partial g}{\partial x} &= \left( A^T \otimes I \right). \end{align}
I always find it much more convenient to do matrix calculus using the differential rather than the gradient. For any variation $\delta X\in \mathbb{R}^{n\times n}$ of $X$,
$$df(X, \delta X) = b^T \delta X^T X c + b^T X^T \delta X c$$
and for a pair of variations $\delta X^1, \delta X^2$, the second derivative is
$$d^2f(X, \delta X^1, \delta X^2) = b^T \left(\delta X^1\right)^T\left(\delta X^2\right) c + b^T \left(\delta X^2\right)^T\left(\delta X^1\right) c.$$
Now if you need the Hessian in coordinates as a rank four tensor, you can extract the coefficients of $\delta X^1_{ij}\delta X^2_{kl}$ from $d^2f$.