Let $A \in R^{k \times p}$. Define $f(X) : R^{p \times k} \rightarrow R$ to be $f(X) = \log \det(XA + I_{p})$, where $I_{p}$ is a $p \times p$ identity matrix. I want to know what is the gradient and hessian of $f(X)$ with respect to $X$. Thank you!
Gradient and Hessian of function on matrix domain
942 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
For convenience, define the matrix $M = (I+XA)$.
Now use the Frobenius (:) Inner Product to write the function, differential and gradient as $$\eqalign{ f &= \log(\det(M)) \cr &= {\rm tr}(\log(M)) \cr\cr df &= M^{-T}:dM \cr &= M^{-T}:dX\,A \cr &= M^{-T}A^T:dX \cr\cr \frac{\partial f}{\partial X} &= M^{-T}A^T \cr\cr\cr }$$ Let's continue by taking the differential of the matrix gradient $G=M^{-T}A^T$ $$\eqalign{ dG &= dM^{-T}A^T \cr &= -M^{-T}(dM^T)M^{-T}A^T \cr &= -M^{-T}(A^TdX^T)M^{-T}A^T \cr &= -B^T\,dX^T\,B^T \cr } $$ where $B=AM^{-1}$
Define two $4^{th}$-order isotropic tensors $({\mathbb E}, {\mathbb W})$ with components $$\eqalign{ {\mathbb E}_{ijkl} &= \delta_{ik}\delta_{jl} \cr {\mathbb W}_{ijkl} &= \delta_{il}\delta_{jk} \cr } $$ and use them to re-arrange the differential and find the Hessian ${\mathbb H}=\frac{\partial G}{\partial X}$ $$\eqalign{ dG &= -(B^T\,{\mathbb E}\,B) : {\mathbb W} : dX \cr {\mathbb H} &= -(B^T\,{\mathbb E}\,B) : {\mathbb W}\cr } $$ where the colon denotes the double-contraction (aka Frobenius) product.
Note that the Hessian is also a $4^{th}$-order tensor.
Let $f(X)=\log(|\det(I+XA)|)$; we calculate $Df_X$ in a point $X$ s.t. $I+XA$ is invertible, that is, $-1$ is not an eigenvalue of $XA$.
$Df_X:H\in M_{p,k}\rightarrow tr(HA(I+XA)^{-1})=tr((I+XA)^{-T}A^TH^T)$ or
$Df_X(H)=<(I+XA)^{-T}A^T,H>$ (the scalar product over the matrices). In other words, the gradient of $f$ is $\nabla(f)(X)=(I+XA)^{-T}A^T$, the lynn's result.
The Hessian is the bilinear symmetric function:
$Hess(f)(X):(H,K)\in M_{p,k}\times M_{p,k}\rightarrow -tr(HA(I+XA)^{-1}KA(I+XA)^{-1})$, that is equivalent to
$\dfrac{\partial^2f}{\partial x_{i,j}\partial x_{k,l}}=-tr(E_{i,j}A(I+XA)^{-1}E_{k,l}A(I+XA)^{-1})$ where $X=[x_{i,j}]$ and $E_{i,j}=e_ie_j^T$.